PEP: 572 Title: Sublocal-Scoped Assignment Expressions Author: Chris Angelico Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 28-Feb-2018 Python-Version: 3.8 Post-History: 28-Feb-2018, 02-Mar-2018, 23-Mar-2018 Abstract ======== This is a proposal for permitting temporary name bindings which are limited to a single statement. Rationale ========= Programmers generally prefer reusing code rather than duplicating it. When an expression needs to be used twice in quick succession but never again, it is convenient to assign it to a temporary name with sublocal scope. By permitting name bindings to exist within a single statement only, we make this both convenient and safe against name collisions. This is particularly notable in list/dict/set comprehensions and generator expressions, where refactoring a subexpression into an assignment statement is not possible. There are currently several ways to create a temporary name binding inside a list comprehension, none of which is universally accepted as ideal. A statement-local name allows any subexpression to be temporarily captured and then used multiple times. Additionally, this syntax can in places be used to remove the need to write an infinite loop with a ``break`` in it. Capturing part of a ``while`` loop's condition can improve the clarity of the loop header while still making the actual value available within the loop body. Syntax and semantics ==================== In any context where arbitrary Python expressions can be used, a **named expression** can appear. This must be parenthesized for clarity, and is of the form ``(expr as NAME)`` where ``expr`` is any valid Python expression, and ``NAME`` is a simple name. The value of such a named expression is the same as the incorporated expression, with the additional side-effect that NAME is bound to that value for the remainder of the current statement. For example:: # Similar to the boolean 'or' but checking for None specifically x = "default" if (spam().ham as eggs) is None else eggs # Even complex expressions can be built up piece by piece y = ((spam() as eggs), (eggs.method() as cheese), cheese[eggs]) Just as function-local names shadow global names for the scope of the function, sublocal names shadow other names for that statement. (This includes other sublocal names.) Assignment to sublocal names is ONLY through this syntax. Regular assignment to the same name will remove the sublocal name and affect the name in the surrounding scope (function, class, or module). Sublocal names never appear in locals() or globals(), and cannot be closed over by nested functions. Execution order and its consequences ------------------------------------ Since the sublocal name binding lasts from its point of execution to the end of the current statement, this can potentially cause confusion when the actual order of execution does not match the programmer's expectations. Some examples:: # A simple statement ends at the newline or semicolon. a = (1 as y) print(y) # NameError # The assignment ignores the SLNB - this adds one to 'a' a = (a + 1 as a) # Compound statements usually enclose everything... if (re.match(...) as m): print(m.groups(0)) print(m) # NameError # ... except when function bodies are involved... if (input("> ") as cmd): def run_cmd(): print("Running command", cmd) # NameError # ... but function *headers* are executed immediately if (input("> ") as cmd): def run_cmd(cmd=cmd): # Capture the value in the default arg print("Running command", cmd) # Works Function bodies, in this respect, behave the same way they do in class scope; assigned names are not closed over by method definitions. Defining a function inside a loop already has potentially-confusing consequences, and sublocals do not change the existing situation. Differences from regular assignment statements ---------------------------------------------- Using ``(EXPR as NAME)`` is similar to ``NAME = EXPR``, but has a number of important distinctions. * Assignment is a statement; sublocal creaation is an expression whose value is the same as the object bound to the new name. * Sublocals disappear at the end of their enclosing statement, at which point the name again refers to whatever it previously would have. Sublocals can thus shadow other names without conflict. * Sublocals cannot be closed over by nested functions, and are completely ignored for this purpose. * Sublocals do not appear in ``locals()`` or ``globals()``. * A sublocal cannot be the target of any form of assignment, including augmented. Attempting to do so will remove the sublocal and assign to the fully-scoped name. In many respects, a sublocal variable is akin to a local variable in an imaginary nested function, except that the overhead of creating and calling a function is bypassed. As with names bound by ``for`` loops inside list comprehensions, sublocal names cannot "leak" into their surrounding scope. Recommended use-cases ===================== Simplifying list comprehensions ------------------------------- These list comprehensions are all approximately equivalent:: # Calling the function twice stuff = [[f(x), x/f(x)] for x in range(5)] # External helper function def pair(x, value): return [value, x/value] stuff = [pair(x, f(x)) for x in range(5)] # Inline helper function stuff = [(lambda y: [y,x/y])(f(x)) for x in range(5)] # Extra 'for' loop - potentially could be optimized internally stuff = [[y, x/y] for x in range(5) for y in [f(x)]] # Iterating over a genexp stuff = [[y, x/y] for x, y in ((x, f(x)) for x in range(5))] # Expanding the comprehension into a loop stuff = [] for x in range(5): y = f(x) stuff.append([y, x/y]) # Wrapping the loop in a generator function def g(): for x in range(5): y = f(x) yield [y, x/y] stuff = list(g()) # Using a mutable cache object (various forms possible) c = {} stuff = [[c.update(y=f(x)) or c['y'], x/c['y']] for x in range(5)] # Using a sublocal name stuff = [[(f(x) as y), x/y] for x in range(5)] If calling ``f(x)`` is expensive or has side effects, the clean operation of the list comprehension gets muddled. Using a short-duration name binding retains the simplicity; while the extra ``for`` loop does achieve this, it does so at the cost of dividing the expression visually, putting the named part at the end of the comprehension instead of the beginning. Capturing condition values -------------------------- Since a sublocal created by an assignment expression extends to the full current statement, even a block statement, this can be used to good effect in the header of an ``if`` or ``while`` statement:: # Current Python, not caring about function return value while input("> ") != "quit": print("You entered a command.") # Current Python, capturing return value - four-line loop header while True: command = input("> "); if command == "quit": break print("You entered:", command) # Proposed alternative to the above while (input("> ") as command) != "quit": print("You entered:", command) # Capturing regular expression match objects # See, for instance, Lib/pydoc.py, which uses a multiline spelling # of this effect if (re.search(pat, text) as match): print("Found:", match.group(0)) # Reading socket data until an empty string is returned while (sock.read() as data): print("Received data:", data) Particularly with the ``while`` loop, this can remove the need to have an infinite loop, an assignment, and a condition. It also creates a smooth parallel between a loop which simply uses a function call as its condition, and one which uses that as its condition but also uses the actual value. Preventing temporaries from leaking ----------------------------------- Inside a class definition, any name assigned to will become a class attribute. Use of a sublocal name binding will prevent temporary variables from becoming public attributes of the class. (TODO: Get example) Performance costs ================= The cost of sublocals must be kept to a minimum, particularly when they are not used; normal assignment should not be measurably penalized. The reference implementation uses a linked list of sublocal cells, with the absence of such a list being the normal case. This is used for code compilation only; once a function's bytecode has been baked in, execution of that bytecode has no performance cost compared to regular assignment. Other Python implementations may choose to do things differently, but a zero run-time cost is strongly recommended, as is a minimal compile-time cost in the case where no sublocal names are used. Forbidden special cases ======================= In two situations, the use of SLNBs makes no sense, and could be confusing due to the ``as`` keyword already having a different meaning in the same context. 1. Exception catching:: try: ... except (Exception as e1) as e2: ... The expression ``(Exception as e1)`` has the value ``Exception``, and creates an SLNB ``e1 = Exception``. This is generally useless, and creates the potential confusion in that these two statements do quite different things: except (Exception as e1): except Exception as e2: The latter captures the exception **instance**, while the former captures the ``Exception`` **type** (not the type of the raised exception). 2. Context managers:: lock = threading.Lock() with (lock as l) as m: ... This captures the original Lock object as ``l``, and the result of calling its ``__enter__`` method as ``m``. As with ``except`` statements, this creates a situation in which parenthesizing an expression subtly changes its semantics, with the additional pitfall that this will frequently work (when ``x.__enter__()`` returns x, eg with file objects). Both of these are forbidden; creating SLNBs in the headers of these statements will result in a SyntaxError. Rejected alternative proposals ============================== Proposals broadly similar to this one have come up frequently on python-ideas. Below are a number of alternative syntaxes, some of them specific to comprehensions, which have been rejected in favour of the one given above. Alternative spellings --------------------- Broadly the same semantics as the current proposal, but spelled differently. 1. ``EXPR as NAME`` without parentheses:: stuff = [[f(x) as y, x/y] for x in range(5)] Omitting the parentheses from this PEP's proposed syntax introduces many syntactic ambiguities. Requiring them in all contexts leaves open the option to make them optional in specific situations where the syntax is unambiguous (cf generator expressions as sole parameters in function calls), but there is no plausible way to make them optional everywhere. 2. Adorning statement-local names with a leading dot:: stuff = [[(f(x) as .y), x/.y] for x in range(5)] This has the advantage that leaked usage can be readily detected, removing some forms of syntactic ambiguity. However, this would be the only place in Python where a variable's scope is encoded into its name, making refactoring harder. This syntax is quite viable, and could be promoted to become the current recommendation if its advantages are found to outweigh its cost. 3. Adding a ``where:`` to any statement to create local name bindings:: value = x**2 + 2*x where: x = spam(1, 4, 7, q) Execution order is inverted (the indented body is performed first, followed by the "header"). This requires a new keyword, unless an existing keyword is repurposed (most likely ``with:``). Special-casing conditional statements ------------------------------------- One of the most popular use-cases is ``if`` and ``while`` statements. Instead of a more general solution, this proposal enhances the syntax of these two statements to add a means of capturing the compared value:: if re.search(pat, text) as match: print("Found:", match.group(0)) This works beautifully if and ONLY if the desired condition is based on the truthiness of the captured value. It is thus effective for specific use-cases (regex matches, socket reads that return `''` when done), and completely useless in more complicated cases (eg where the condition is ``f(x) < 0`` and you want to capture the value of ``f(x)``). It also has no benefit to list comprehensions. Advantages: No syntactic ambiguities. Disadvantages: Answers only a fraction of possible use-cases, even in ``if``/``while`` statements. Special-casing comprehensions ----------------------------- Another common use-case is comprehensions (list/set/dict, and genexps). As above, proposals have been made for comprehension-specific solutions. 1. ``where``, ``let``, or ``given``:: stuff = [(y, x/y) where y = f(x) for x in range(5)] stuff = [(y, x/y) let y = f(x) for x in range(5)] stuff = [(y, x/y) given y = f(x) for x in range(5)] This brings the subexpression to a location in between the 'for' loop and the expression. It introduces an additional language keyword, which creates conflicts. Of the three, ``where`` reads the most cleanly, but also has the greatest potential for conflict (eg SQLAlchemy and numpy have ``where`` methods, as does ``tkinter.dnd.Icon`` in the standard library). 2. ``with NAME = EXPR``:: stuff = [(y, x/y) with y = f(x) for x in range(5)] As above, but reusing the `with` keyword. Doesn't read too badly, and needs no additional language keyword. Is restricted to comprehensions, though, and cannot as easily be transformed into "longhand" for-loop syntax. Has the C problem that an equals sign in an expression can now create a name binding, rather than performing a comparison. Would raise the question of why "with NAME = EXPR:" cannot be used as a statement on its own. 3. ``with EXPR as NAME``:: stuff = [(y, x/y) with f(x) as y for x in range(5)] As per option 2, but using ``as`` rather than an equals sign. Aligns syntactically with other uses of ``as`` for name binding, but a simple transformation to for-loop longhand would create drastically different semantics; the meaning of ``with`` inside a comprehension would be completely different from the meaning as a stand-alone statement, while retaining identical syntax. Regardless of the spelling chosen, this introduces a stark difference between comprehensions and the equivalent unrolled long-hand form of the loop. It is no longer possible to unwrap the loop into statement form without reworking any name bindings. The only keyword that can be repurposed to this task is ``with``, thus giving it sneakily different semantics in a comprehension than in a statement; alternatively, a new keyword is needed, with all the costs therein. Assignment expressions ====================== Rather than creating a statement-local name, these forms of name binding have the exact same semantics as regular assignment: bind to a local name unless there's a ``global`` or ``nonlocal`` declaration. Syntax options: 1. ``(EXPR as NAME)`` as per the promoted proposal 2. C-style ``NAME = EXPR`` in any context 3. A new and dedicated operator with C-like semantics ``NAME := EXPR`` The C syntax has been long known to be a bug magnet. The syntactic similarity between ``if (x == y)`` and ``if (x = y)`` belies their drastically different semantics. While this can be mitigated with good tools, such tools would need to be deployed for Python, and even with perfect tooling, one-character bugs will still happen. Creating a new operator mitigates this, but creates a disconnect between regular assignment statements and these new assignment expressions, or would result in the old syntax being a short-hand usable in certain situations only. Regardless of the syntax, all of these have the problem that wide-scope names can be assigned to from an expression. This creates strange edge cases and unexpected behaviour, such as:: # Name bindings inside list comprehensions usually won't leak x = [(y as local) for z in iter] # But occasionally they will! x = [y for z in (iter as leaky)] # Function default arguments are evaluated in the surrounding scope, # not the enclosing scope def x(y = (1 as z)): # z here is closing over the outer variable # z is a regular variable here # Assignment targets are evaluated after the values to be assigned x[y] = f((1 as y)) The same peculiarities can be seen with function calls and global/nonlocal declarations, but will become considerably more likely to occur. Other uses of sublocals ======================= Once sublocal name bindings exist as a concept, they could potentially be used in additional ways. Exception catching ------------------ Currently, ``except Exception as e:`` binds to a regular (usually local) name, and then unbinds this name. This could be changed to bind to a sublocal name whose scope ends at the end of the except block. List/set/dict comprehensions ---------------------------- Rather than create an entire function scope, a comprehension could create subscopes for the names it binds to. They would thus be protected against name leakage just as they are today, but without the edge cases around class scope and name references. References ========== .. [1] Proof of concept / reference implementation (https://github.com/Rosuav/cpython/tree/statement-local-variables) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: