490 lines
18 KiB
ReStructuredText
490 lines
18 KiB
ReStructuredText
PEP: 572
|
||
Title: Sublocal-Scoped Assignment Expressions
|
||
Author: Chris Angelico <rosuav@gmail.com>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 28-Feb-2018
|
||
Python-Version: 3.8
|
||
Post-History: 28-Feb-2018, 02-Mar-2018, 23-Mar-2018
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This is a proposal for permitting temporary name bindings
|
||
which are limited to a single statement.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
Programmers generally prefer reusing code rather than duplicating it. When
|
||
an expression needs to be used twice in quick succession but never again,
|
||
it is convenient to assign it to a temporary name with sublocal scope.
|
||
By permitting name bindings to exist within a single statement only, we
|
||
make this both convenient and safe against name collisions.
|
||
|
||
This is particularly notable in list/dict/set comprehensions and generator
|
||
expressions, where refactoring a subexpression into an assignment statement
|
||
is not possible. There are currently several ways to create a temporary name
|
||
binding inside a list comprehension, none of which is universally
|
||
accepted as ideal. A statement-local name allows any subexpression to be
|
||
temporarily captured and then used multiple times.
|
||
|
||
Additionally, this syntax can in places be used to remove the need to write an
|
||
infinite loop with a ``break`` in it. Capturing part of a ``while`` loop's
|
||
condition can improve the clarity of the loop header while still making the
|
||
actual value available within the loop body.
|
||
|
||
|
||
Syntax and semantics
|
||
====================
|
||
|
||
In any context where arbitrary Python expressions can be used, a **named
|
||
expression** can appear. This must be parenthesized for clarity, and is of
|
||
the form ``(expr as NAME)`` where ``expr`` is any valid Python expression,
|
||
and ``NAME`` is a simple name.
|
||
|
||
The value of such a named expression is the same as the incorporated
|
||
expression, with the additional side-effect that NAME is bound to that
|
||
value for the remainder of the current statement. For example::
|
||
|
||
# Similar to the boolean 'or' but checking for None specifically
|
||
x = "default" if (spam().ham as eggs) is None else eggs
|
||
|
||
# Even complex expressions can be built up piece by piece
|
||
y = ((spam() as eggs), (eggs.method() as cheese), cheese[eggs])
|
||
|
||
Just as function-local names shadow global names for the scope of the
|
||
function, sublocal names shadow other names for that statement. (This
|
||
includes other sublocal names.)
|
||
|
||
Assignment to sublocal names is ONLY through this syntax. Regular
|
||
assignment to the same name will remove the sublocal name and
|
||
affect the name in the surrounding scope (function, class, or module).
|
||
|
||
Sublocal names never appear in locals() or globals(), and cannot be
|
||
closed over by nested functions.
|
||
|
||
|
||
Execution order and its consequences
|
||
------------------------------------
|
||
|
||
Since the sublocal name binding lasts from its point of execution
|
||
to the end of the current statement, this can potentially cause confusion
|
||
when the actual order of execution does not match the programmer's
|
||
expectations. Some examples::
|
||
|
||
# A simple statement ends at the newline or semicolon.
|
||
a = (1 as y)
|
||
print(y) # NameError
|
||
|
||
# The assignment ignores the SLNB - this adds one to 'a'
|
||
a = (a + 1 as a)
|
||
|
||
# Compound statements usually enclose everything...
|
||
if (re.match(...) as m):
|
||
print(m.groups(0))
|
||
print(m) # NameError
|
||
|
||
# ... except when function bodies are involved...
|
||
if (input("> ") as cmd):
|
||
def run_cmd():
|
||
print("Running command", cmd) # NameError
|
||
|
||
# ... but function *headers* are executed immediately
|
||
if (input("> ") as cmd):
|
||
def run_cmd(cmd=cmd): # Capture the value in the default arg
|
||
print("Running command", cmd) # Works
|
||
|
||
Function bodies, in this respect, behave the same way they do in class scope;
|
||
assigned names are not closed over by method definitions. Defining a function
|
||
inside a loop already has potentially-confusing consequences, and sublocals
|
||
do not change the existing situation.
|
||
|
||
|
||
Differences from regular assignment statements
|
||
----------------------------------------------
|
||
|
||
Using ``(EXPR as NAME)`` is similar to ``NAME = EXPR``, but has a number of
|
||
important distinctions.
|
||
|
||
* Assignment is a statement; sublocal creaation is an expression whose value
|
||
is the same as the object bound to the new name.
|
||
* Sublocals disappear at the end of their enclosing statement, at which point
|
||
the name again refers to whatever it previously would have. Sublocals can
|
||
thus shadow other names without conflict.
|
||
* Sublocals cannot be closed over by nested functions, and are completely
|
||
ignored for this purpose.
|
||
* Sublocals do not appear in ``locals()`` or ``globals()``.
|
||
* A sublocal cannot be the target of any form of assignment, including
|
||
augmented. Attempting to do so will remove the sublocal and assign to the
|
||
fully-scoped name.
|
||
|
||
In many respects, a sublocal variable is akin to a local variable in an
|
||
imaginary nested function, except that the overhead of creating and calling
|
||
a function is bypassed. As with names bound by ``for`` loops inside list
|
||
comprehensions, sublocal names cannot "leak" into their surrounding scope.
|
||
|
||
|
||
Recommended use-cases
|
||
=====================
|
||
|
||
Simplifying list comprehensions
|
||
-------------------------------
|
||
|
||
These list comprehensions are all approximately equivalent::
|
||
|
||
# Calling the function twice
|
||
stuff = [[f(x), x/f(x)] for x in range(5)]
|
||
|
||
# External helper function
|
||
def pair(x, value): return [value, x/value]
|
||
stuff = [pair(x, f(x)) for x in range(5)]
|
||
|
||
# Inline helper function
|
||
stuff = [(lambda y: [y,x/y])(f(x)) for x in range(5)]
|
||
|
||
# Extra 'for' loop - potentially could be optimized internally
|
||
stuff = [[y, x/y] for x in range(5) for y in [f(x)]]
|
||
|
||
# Iterating over a genexp
|
||
stuff = [[y, x/y] for x, y in ((x, f(x)) for x in range(5))]
|
||
|
||
# Expanding the comprehension into a loop
|
||
stuff = []
|
||
for x in range(5):
|
||
y = f(x)
|
||
stuff.append([y, x/y])
|
||
|
||
# Wrapping the loop in a generator function
|
||
def g():
|
||
for x in range(5):
|
||
y = f(x)
|
||
yield [y, x/y]
|
||
stuff = list(g())
|
||
|
||
# Using a mutable cache object (various forms possible)
|
||
c = {}
|
||
stuff = [[c.update(y=f(x)) or c['y'], x/c['y']] for x in range(5)]
|
||
|
||
# Using a sublocal name
|
||
stuff = [[(f(x) as y), x/y] for x in range(5)]
|
||
|
||
If calling ``f(x)`` is expensive or has side effects, the clean operation of
|
||
the list comprehension gets muddled. Using a short-duration name binding
|
||
retains the simplicity; while the extra ``for`` loop does achieve this, it
|
||
does so at the cost of dividing the expression visually, putting the named
|
||
part at the end of the comprehension instead of the beginning.
|
||
|
||
|
||
Capturing condition values
|
||
--------------------------
|
||
|
||
Since a sublocal created by an assignment expression extends to the full
|
||
current statement, even a block statement, this can be used to good effect
|
||
in the header of an ``if`` or ``while`` statement::
|
||
|
||
# Current Python, not caring about function return value
|
||
while input("> ") != "quit":
|
||
print("You entered a command.")
|
||
|
||
# Current Python, capturing return value - four-line loop header
|
||
while True:
|
||
command = input("> ");
|
||
if command == "quit":
|
||
break
|
||
print("You entered:", command)
|
||
|
||
# Proposed alternative to the above
|
||
while (input("> ") as command) != "quit":
|
||
print("You entered:", command)
|
||
|
||
# Capturing regular expression match objects
|
||
# See, for instance, Lib/pydoc.py, which uses a multiline spelling
|
||
# of this effect
|
||
if (re.search(pat, text) as match):
|
||
print("Found:", match.group(0))
|
||
|
||
# Reading socket data until an empty string is returned
|
||
while (sock.read() as data):
|
||
print("Received data:", data)
|
||
|
||
Particularly with the ``while`` loop, this can remove the need to have an
|
||
infinite loop, an assignment, and a condition. It also creates a smooth
|
||
parallel between a loop which simply uses a function call as its condition,
|
||
and one which uses that as its condition but also uses the actual value.
|
||
|
||
|
||
Preventing temporaries from leaking
|
||
-----------------------------------
|
||
|
||
Inside a class definition, any name assigned to will become a class attribute.
|
||
Use of a sublocal name binding will prevent temporary variables from becoming
|
||
public attributes of the class.
|
||
|
||
(TODO: Get example)
|
||
|
||
|
||
Performance costs
|
||
=================
|
||
|
||
The cost of sublocals must be kept to a minimum, particularly when they are not
|
||
used; normal assignment should not be measurably penalized. The reference
|
||
implementation uses a linked list of sublocal cells, with the absence of such
|
||
a list being the normal case. This is used for code compilation only; once a
|
||
function's bytecode has been baked in, execution of that bytecode has no
|
||
performance cost compared to regular assignment.
|
||
|
||
Other Python implementations may choose to do things differently, but a zero
|
||
run-time cost is strongly recommended, as is a minimal compile-time cost in
|
||
the case where no sublocal names are used.
|
||
|
||
|
||
Forbidden special cases
|
||
=======================
|
||
|
||
In two situations, the use of SLNBs makes no sense, and could be confusing due
|
||
to the ``as`` keyword already having a different meaning in the same context.
|
||
|
||
1. Exception catching::
|
||
|
||
try:
|
||
...
|
||
except (Exception as e1) as e2:
|
||
...
|
||
|
||
The expression ``(Exception as e1)`` has the value ``Exception``, and
|
||
creates an SLNB ``e1 = Exception``. This is generally useless, and creates
|
||
the potential confusion in that these two statements do quite different
|
||
things:
|
||
|
||
except (Exception as e1):
|
||
except Exception as e2:
|
||
|
||
The latter captures the exception **instance**, while the former captures
|
||
the ``Exception`` **type** (not the type of the raised exception).
|
||
|
||
2. Context managers::
|
||
|
||
lock = threading.Lock()
|
||
with (lock as l) as m:
|
||
...
|
||
|
||
This captures the original Lock object as ``l``, and the result of calling
|
||
its ``__enter__`` method as ``m``. As with ``except`` statements, this
|
||
creates a situation in which parenthesizing an expression subtly changes
|
||
its semantics, with the additional pitfall that this will frequently work
|
||
(when ``x.__enter__()`` returns x, eg with file objects).
|
||
|
||
Both of these are forbidden; creating SLNBs in the headers of these statements
|
||
will result in a SyntaxError.
|
||
|
||
|
||
Rejected alternative proposals
|
||
==============================
|
||
|
||
Proposals broadly similar to this one have come up frequently on python-ideas.
|
||
Below are a number of alternative syntaxes, some of them specific to
|
||
comprehensions, which have been rejected in favour of the one given above.
|
||
|
||
|
||
Alternative spellings
|
||
---------------------
|
||
|
||
Broadly the same semantics as the current proposal, but spelled differently.
|
||
|
||
1. ``EXPR as NAME`` without parentheses::
|
||
|
||
stuff = [[f(x) as y, x/y] for x in range(5)]
|
||
|
||
Omitting the parentheses from this PEP's proposed syntax introduces many
|
||
syntactic ambiguities. Requiring them in all contexts leaves open the
|
||
option to make them optional in specific situations where the syntax is
|
||
unambiguous (cf generator expressions as sole parameters in function
|
||
calls), but there is no plausible way to make them optional everywhere.
|
||
|
||
2. Adorning statement-local names with a leading dot::
|
||
|
||
stuff = [[(f(x) as .y), x/.y] for x in range(5)]
|
||
|
||
This has the advantage that leaked usage can be readily detected, removing
|
||
some forms of syntactic ambiguity. However, this would be the only place
|
||
in Python where a variable's scope is encoded into its name, making
|
||
refactoring harder. This syntax is quite viable, and could be promoted to
|
||
become the current recommendation if its advantages are found to outweigh
|
||
its cost.
|
||
|
||
3. Adding a ``where:`` to any statement to create local name bindings::
|
||
|
||
value = x**2 + 2*x where:
|
||
x = spam(1, 4, 7, q)
|
||
|
||
Execution order is inverted (the indented body is performed first, followed
|
||
by the "header"). This requires a new keyword, unless an existing keyword
|
||
is repurposed (most likely ``with:``).
|
||
|
||
|
||
Special-casing conditional statements
|
||
-------------------------------------
|
||
|
||
One of the most popular use-cases is ``if`` and ``while`` statements. Instead
|
||
of a more general solution, this proposal enhances the syntax of these two
|
||
statements to add a means of capturing the compared value::
|
||
|
||
if re.search(pat, text) as match:
|
||
print("Found:", match.group(0))
|
||
|
||
This works beautifully if and ONLY if the desired condition is based on the
|
||
truthiness of the captured value. It is thus effective for specific
|
||
use-cases (regex matches, socket reads that return `''` when done), and
|
||
completely useless in more complicated cases (eg where the condition is
|
||
``f(x) < 0`` and you want to capture the value of ``f(x)``). It also has
|
||
no benefit to list comprehensions.
|
||
|
||
Advantages: No syntactic ambiguities. Disadvantages: Answers only a fraction
|
||
of possible use-cases, even in ``if``/``while`` statements.
|
||
|
||
|
||
Special-casing comprehensions
|
||
-----------------------------
|
||
|
||
Another common use-case is comprehensions (list/set/dict, and genexps). As
|
||
above, proposals have been made for comprehension-specific solutions.
|
||
|
||
1. ``where``, ``let``, or ``given``::
|
||
|
||
stuff = [(y, x/y) where y = f(x) for x in range(5)]
|
||
stuff = [(y, x/y) let y = f(x) for x in range(5)]
|
||
stuff = [(y, x/y) given y = f(x) for x in range(5)]
|
||
|
||
This brings the subexpression to a location in between the 'for' loop and
|
||
the expression. It introduces an additional language keyword, which creates
|
||
conflicts. Of the three, ``where`` reads the most cleanly, but also has the
|
||
greatest potential for conflict (eg SQLAlchemy and numpy have ``where``
|
||
methods, as does ``tkinter.dnd.Icon`` in the standard library).
|
||
|
||
2. ``with NAME = EXPR``::
|
||
|
||
stuff = [(y, x/y) with y = f(x) for x in range(5)]
|
||
|
||
As above, but reusing the `with` keyword. Doesn't read too badly, and needs
|
||
no additional language keyword. Is restricted to comprehensions, though,
|
||
and cannot as easily be transformed into "longhand" for-loop syntax. Has
|
||
the C problem that an equals sign in an expression can now create a name
|
||
binding, rather than performing a comparison. Would raise the question of
|
||
why "with NAME = EXPR:" cannot be used as a statement on its own.
|
||
|
||
3. ``with EXPR as NAME``::
|
||
|
||
stuff = [(y, x/y) with f(x) as y for x in range(5)]
|
||
|
||
As per option 2, but using ``as`` rather than an equals sign. Aligns
|
||
syntactically with other uses of ``as`` for name binding, but a simple
|
||
transformation to for-loop longhand would create drastically different
|
||
semantics; the meaning of ``with`` inside a comprehension would be
|
||
completely different from the meaning as a stand-alone statement, while
|
||
retaining identical syntax.
|
||
|
||
Regardless of the spelling chosen, this introduces a stark difference between
|
||
comprehensions and the equivalent unrolled long-hand form of the loop. It is
|
||
no longer possible to unwrap the loop into statement form without reworking
|
||
any name bindings. The only keyword that can be repurposed to this task is
|
||
``with``, thus giving it sneakily different semantics in a comprehension than
|
||
in a statement; alternatively, a new keyword is needed, with all the costs
|
||
therein.
|
||
|
||
Assignment expressions
|
||
======================
|
||
|
||
Rather than creating a statement-local name, these forms of name binding have
|
||
the exact same semantics as regular assignment: bind to a local name unless
|
||
there's a ``global`` or ``nonlocal`` declaration.
|
||
|
||
Syntax options:
|
||
|
||
1. ``(EXPR as NAME)`` as per the promoted proposal
|
||
|
||
2. C-style ``NAME = EXPR`` in any context
|
||
|
||
3. A new and dedicated operator with C-like semantics ``NAME := EXPR``
|
||
|
||
The C syntax has been long known to be a bug magnet. The syntactic similarity
|
||
between ``if (x == y)`` and ``if (x = y)`` belies their drastically different
|
||
semantics. While this can be mitigated with good tools, such tools would need
|
||
to be deployed for Python, and even with perfect tooling, one-character bugs
|
||
will still happen. Creating a new operator mitigates this, but creates a
|
||
disconnect between regular assignment statements and these new assignment
|
||
expressions, or would result in the old syntax being a short-hand usable in
|
||
certain situations only.
|
||
|
||
Regardless of the syntax, all of these have the problem that wide-scope names
|
||
can be assigned to from an expression. This creates strange edge cases and
|
||
unexpected behaviour, such as::
|
||
|
||
# Name bindings inside list comprehensions usually won't leak
|
||
x = [(y as local) for z in iter]
|
||
# But occasionally they will!
|
||
x = [y for z in (iter as leaky)]
|
||
|
||
# Function default arguments are evaluated in the surrounding scope,
|
||
# not the enclosing scope
|
||
def x(y = (1 as z)):
|
||
# z here is closing over the outer variable
|
||
# z is a regular variable here
|
||
|
||
# Assignment targets are evaluated after the values to be assigned
|
||
x[y] = f((1 as y))
|
||
|
||
The same peculiarities can be seen with function calls and global/nonlocal
|
||
declarations, but will become considerably more likely to occur.
|
||
|
||
|
||
Other uses of sublocals
|
||
=======================
|
||
|
||
Once sublocal name bindings exist as a concept, they could potentially be
|
||
used in additional ways.
|
||
|
||
|
||
Exception catching
|
||
------------------
|
||
|
||
Currently, ``except Exception as e:`` binds to a regular (usually local) name,
|
||
and then unbinds this name. This could be changed to bind to a sublocal name
|
||
whose scope ends at the end of the except block.
|
||
|
||
|
||
List/set/dict comprehensions
|
||
----------------------------
|
||
|
||
Rather than create an entire function scope, a comprehension could create
|
||
subscopes for the names it binds to. They would thus be protected against
|
||
name leakage just as they are today, but without the edge cases around
|
||
class scope and name references.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] Proof of concept / reference implementation
|
||
(https://github.com/Rosuav/cpython/tree/statement-local-variables)
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|