PEP: 3150 Title: Statement local namespaces (aka "given" clause) Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Deferred Type: Standards Track Content-Type: text/x-rst Created: 2010-07-09 Python-Version: 3.3 Post-History: 2010-07-14, 2011-04-21, 2011-06-13 Resolution: TBD Abstract ======== This PEP proposes the addition of an optional ``given`` clause to several Python statements that do not currently have an associated code suite. This clause will create a statement local namespace for additional names that are accessible in the associated statement, but do not become part of the containing namespace. The primary motivation is to elevate ordinary assignment statements to be on par with ``class`` and ``def`` statements where the name of the item to be defined is presented to the reader in advance of the details of how the value of that item is calculated. A secondary motivation is to simplify interim calculations in module and class level code without polluting the resulting namespaces. There are additional emergent properties of the proposed solution which may be of interest to some users. Most notably, it is proposed that this clause use a new kind of scope that performs early binding of variables, potentially replacing other techniques that achieve the same effect (such as the "default argument hack"). The specific proposal in this PEP has been informed by various explorations of this and related concepts over the years (e.g. [1], [2], [3], [6]), and is inspired to some degree by the ``where`` and ``let`` clauses in Haskell. It avoids some problems that have been identified in past proposals, but has not yet itself been subject to the test of implementation. PEP Deferral ============ Despite the lifting of the language moratorium (PEP 3003) for Python 3.3, this PEP currently remains in a Deferred state. That means the PEP has to pass at least *two* hurdles to become part of 3.3. Firstly, I personally have to be sufficiently convinced of the PEP's value and feasibility to return it to Draft status. While I do see merit in the concept of statement local namespaces (otherwise I wouldn't have spent so much time pondering the idea over the years), I also have grave doubts as to the wisdom of actually adding it to the language (see "Key Concern" below). Secondly, Guido van Rossum (or his delegate) will need to accept the PEP. At the very least, that will not occur until a fully functional draft implementation for CPython is available, and the other three major Python implementations (PyPy, Jython, IronPython) have indicated that they consider it feasible to implement the proposed semantics once they reach the point of targetting 3.3 compatibility. Input from related projects with a vested interest in Python's syntax (e.g. Cython) will also be valuable. Proposal ======== This PEP proposes the addition of an optional ``given`` clause to the syntax for simple statements which may contain an expression, or may substitute for such an expression for purely syntactic purposes. The current list of simple statements that would be affected by this addition is as follows: * expression statement * assignment statement * augmented assignment statement * del statement * return statement * yield statement * raise statement * assert statement * pass statement The ``given`` clause would allow subexpressions to be referenced by name in the header line, with the actual definitions following in the indented clause. As a simple example:: c = sqrt(a*a + b*b) given: a, b = 3, 4 The ``pass`` statement is included to provide a consistent way to skip inclusion of a meaningful expression in the header line. While this is not an intended use case, it isn't one that can be prevented as multiple alternatives (such as ``...`` and ``()``) remain available even if ``pass`` itself is disallowed. Rationale ========= Function and class statements in Python have a unique property relative to ordinary assignment statements: to some degree, they are *declarative*. They present the reader of the code with some critical information about a name that is about to be defined, before proceeding on with the details of the actual definition in the function or class body. The *name* of the object being declared is the first thing stated after the keyword. Other important information is also given the honour of preceding the implementation details: - decorators (which can greatly affect the behaviour of the created object, and were placed ahead of even the keyword and name as a matter of practicality moreso than aesthetics) - the docstring (on the first line immediately following the header line) - parameters, default values and annotations for function definitions - parent classes, metaclass and optionally other details (depending on the metaclass) for class definitions This PEP proposes to make a similar declarative style available for arbitrary assignment operations, by permitting the inclusion of a "given" suite following any simple assignment statement:: TARGET = [TARGET2 = ... TARGETN =] EXPR given: SUITE By convention, code in the body of the suite should be oriented solely towards correctly defining the assignment operation carried out in the header line. The header line operation should also be adequately descriptive (e.g. through appropriate choices of variable names) to give a reader a reasonable idea of the purpose of the operation without reading the body of the suite. However, while they are the initial motivating use case, limiting this feature solely to simple assignments would be overly restrictive. Once the feature is defined at all, it would be quite arbitrary to prevent its use for augmented assignments, return statements, yield expressions and arbitrary expressions that may modify the application state. The ``given`` clause may also function as a more readable alternative to some uses of lambda expressions and similar constructs when passing one-off functions to operations like ``sorted()``. In module and class level code, the ``given`` clause will serve as a clear and reliable replacement for usage of the ``del`` statement to keep interim working variables from polluting the resulting namespace. One potentially useful way to think of the proposed clause is as a middle ground between conventional in-line code and separation of an operation out into a dedicated function. Keyword Choice ============== This proposal initially used ``where`` based on the name of a similar construct in Haskell. However, it has been pointed out that there are existing Python libraries (such as Numpy [4]) that already use ``where`` in the SQL query condition sense, making that keyword choice potentially confusing. While ``given`` may also be used as a variable name (and hence would be deprecated using the usual ``__future__`` dance for introducing new keywords), it is associated much more strongly with the desired "here are some extra variables this expression may use" semantics for the new clause. Reusing the ``with`` keyword has also been proposed. This has the advantage of avoiding the addition of a new keyword, but also has a high potential for confusion as the ``with`` clause and ``with`` statement would look similar but do completely different things. That way lies C++ and Perl :) Syntax Change ============= Current:: expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) | ('=' (yield_expr|testlist_star_expr))*) del_stmt: 'del' exprlist pass_stmt: 'pass' return_stmt: 'return' [testlist] yield_stmt: yield_expr raise_stmt: 'raise' [test ['from' test]] assert_stmt: 'assert' test [',' test] New:: expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) | ('=' (yield_expr|testlist_star_expr))*) [given_clause] del_stmt: 'del' exprlist [given_clause] pass_stmt: 'pass' [given_clause] return_stmt: 'return' [testlist] [given_clause] yield_stmt: yield_expr [given_clause] raise_stmt: 'raise' [test ['from' test]] [given_clause] assert_stmt: 'assert' test [',' test] [given_clause] given_clause: "given" ":" suite (Note that expr_stmt in the grammar covers assignment and augmented assignment in addition to simple expression statements) The new clause is added as an optional element of the existing statements rather than as a new kind of compound statement in order to avoid creating an ambiguity in the grammar. It is applied only to the specific elements listed so that nonsense like the following is disallowed:: break given: a = b = 1 However, the precise Grammar change described above is inadequate, as it creates problems for the definition of simple_stmt (which allows chaining of multiple single line statements with ";" rather than "\\n"). So the above syntax change should instead be taken as a statement of intent. Any actual proposal would need to resolve the simple_stmt parsing problem before it could be seriously considered. This would likely require a non-trivial restructuring of the grammar, breaking up small_stmt and flow_stmt to separate the statements that potentially contain arbitrary subexpressions and then allowing a single one of those statements with a ``given`` clause at the simple_stmt level. Something along the lines of:: stmt: simple_stmt | given_stmt | compound_stmt simple_stmt: small_stmt (';' (small_stmt | subexpr_stmt))* [';'] NEWLINE small_stmt: (pass_stmt | flow_stmt | import_stmt | global_stmt | nonlocal_stmt) flow_stmt: break_stmt | continue_stmt given_stmt: subexpr_stmt (given_clause | (';' (small_stmt | subexpr_stmt))* [';']) NEWLINE subexpr_stmt: expr_stmt | del_stmt | flow_subexpr_stmt | assert_stmt flow_subexpr_stmt: return_stmt | raise_stmt | yield_stmt given_clause: "given" ":" suite For reference, here are the current definitions at that level:: stmt: simple_stmt | compound_stmt simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | nonlocal_stmt | assert_stmt) flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt Possible Implementation Strategy ================================ Torture Test ------------ An implementation of this PEP should support execution of the following code at module, class and function scope:: b = {} a = b[f(a)] = x given: x = 42 def f(x): return x assert "x" not in locals() assert "f" not in locals() assert a == 42 assert d[42] == 42 given: d = b assert "d" not in locals() y = y given: x = 42 def f(): pass y = locals("x"), f.__name__ assert "x" not in locals() assert "f" not in locals() assert y == (42, "f") Most naive implementations will choke on the first complex assignment, while less naive but still broken implementations will fail when the torture test is executed at class scope. Renaming based strategies struggle to support ``locals()`` correctly and also have problems with class and function ``__name__`` attributes. And yes, that's a perfectly well-defined assignment statement. Insane, you might rightly say, but legal:: >>> def f(x): return x ... >>> x = 42 >>> b = {} >>> a = b[f(a)] = x >>> a 42 >>> b {42: 42} Details of Proposed Semantics ----------------------------- AKA How Class Scopes Screw You When Attempting To Implement This The natural idea when setting out to implement this concept is to use an ordinary nested function scope. This doesn't work for the two reasons mentioned in the Torture Test section above: * Non-local variables are not your friend because they ignore class scopes and (when writing back to the outer scope) aren't really on speaking terms with module scopes either. * Return-based semantics struggle with complex assignment statements like the one in the torture test The second thought is generally some kind of hidden renaming strategy. This also creates problems, as Python exposes variables names via the ``locals()`` dictionary and class and function ``__name__`` attributes. The most promising approach is one based on symtable analysis and copy-in-copy-out referencing semantics to move any required name bindings between the inner and outer scopes. The torture test above would then translate to something like the following:: b = {} def _anon1(b): # 'b' reference copied in x = 42 def f(x): return x a = b[f(a)] = x return a # 'a' reference copied out a = _anon1(b) assert "x" not in locals() assert "f" not in locals() assert a == 42 def _anon2(b) # 'b' reference copied in d = b assert d[42] == 42 # Nothing to copy out (not an assignment) _anon2() assert "d" not in locals() def _anon3() # Nothing to copy in (no references to other variables) x = 42 def f(): pass y = locals("x"), f.__name__ y = y # Assuming no optimisation of special cases return y # 'y' reference copied out y = _anon3() assert "x" not in locals() assert "f" not in locals() assert y == (42, "f") However, as noted in the abstract, an actual implementation of this idea has never been tried. Detailed Semantics #1: Early Binding of Variable References ----------------------------------------------------------- The copy-in-copy-out semantics mean that all variable references from a ``given`` clause will exhibit early binding behaviour, in contrast to the late binding typically seen with references to closure variables and globals in ordinary functions. This behaviour will allow the ``given`` clause to be used as a substitute for the default argument hack when early binding behaviour is desired:: # Current Python (late binding) seq = [] for i in range(10): def f(): return i seq.append(f) assert [f() for f in seq] == [9]*10 # Current Python (early binding via default argument hack) seq = [] for i in range(10): def f(_i=i): return i seq.append(f) assert [f() for f in seq] == list(range(10)) # Early binding via given clause seq = [] for i in range(10): seq.append(f) given: def f(): return i assert [f() for f in seq] == list(range(10)) Note that the current intention is for the copy-in/copy-out semantics to apply only to names defined in the local scope containing the ``given`` clause. Name in outer scopes will be referenced as normal. This intention is subject to revision based on feedback and practicalities of implementation. Detailed Semantics #2: Handling of ``nonlocal`` and ``global`` -------------------------------------------------------------- ``nonlocal`` and ``global`` will largely operate as if the anonymous functions were defined as in the expansion above. However, they will also override the default early-binding semantics from names from the containing scope. This intention is subject to revision based on feedback and practicalities of implementation. Detailed Semantics #3: Handling of ``break`` and ``continue`` ------------------------------------------------------------- ``break`` and ``continue`` will operate as if the anonymous functions were defined as in the expansion above. They will be syntax errors if they occur in the ``given`` clause suite but will work normally if they appear within a ``for`` or ``while`` loop as part of that suite. Detailed Semantics #4: Handling of ``return`` and ``yield`` ------------------------------------------------------------- ``return`` and ``yield`` are explicitly disallowed in the ``given`` clause suite and will be syntax errors if they occur. They will work normally if they appear within a ``def`` statement within that suite. Examples ======== Defining "one-off" classes which typically only have a single instance:: # Current Python (instantiation after definition) class public_name(): ... # However many lines public_name = public_name(*params) # Becomes: public_name = MeaningfulClassName(*params) given: class MeaningfulClassName(): ... # Should trawl the stdlib for an example of doing this Calculating attributes without polluting the local namespace (from os.py):: # Current Python (manual namespace cleanup) def _createenviron(): ... # 27 line function environ = _createenviron() del _createenviron # Becomes: environ = _createenviron() given: def _createenviron(): ... # 27 line function Replacing default argument hack (from functools.lru_cache):: # Current Python (default argument hack) def decorating_function(user_function, tuple=tuple, sorted=sorted, len=len, KeyError=KeyError): ... # 60 line function return decorating_function # Becomes: return decorating_function given: # Cell variables rather than locals, but should give similar speedup tuple, sorted, len, KeyError = tuple, sorted, len, KeyError def decorating_function(user_function): ... # 60 line function # This example also nicely makes it clear that there is nothing in the # function after the nested function definition. Due to additional # nested functions, that isn't entirely clear in the current code. Anticipated Objections ====================== * Two Ways To Do It: a lot of code may now be written with values defined either before the expression where they are used or afterwards in a ``given`` clause, creating two ways to do it, without an obvious way of choosing between them. * Out of Order Execution: the ``given`` clause makes execution jump around a little strangely, as the body of the ``given`` clause is executed before the simple statement in the clause header. The closest any other part of Python comes to this is the out of order evaluation in list comprehensions, generator expressions and conditional expressions. * Harmful to Introspection: poking around in module and class internals is an invaluable tool for white-box testing and interactive debugging. The ``given`` clause will be quite effective at preventing access to temporary state used during calculations (although no more so than current usage of ``del`` statements in that regard) These objections should not be dismissed lightly - the proposal in this PEP needs to be subjected to the test of application to a large code base (such as the standard library) in a search for examples where the readability of real world code is genuinely enhanced. New PEP 8 guidelines would also need to be developed to provide appropriate direction on when to use the ``given`` clause over ordinary variable assignments. Some thoughts on possible guidelines are provided at [7] Possible Additions ================== * The current proposal allows the addition of a ``given`` clause only for simple statements. Extending the idea to allow the use of compound statements would be quite possible, but doing so raises serious readability concerns (as values defined in the ``given`` clause may be used well before they are defined, exactly the kind of readability trap that other features like decorators and ``with`` statements are designed to eliminate) * Currently only the outermost clause of comprehensions and generator expressions can reference the surrounding namespace when executed at class level. If this proposal is implemented successfully, the associated namespace semantics could allow that restriction to be lifted. There would be backwards compatibility implications in doing so as existing code may be relying on the behaviour of ignoring class level variables, but the idea is worth considering. Reference Implementation ======================== None as yet. If you want a crash course in Python namespace semantics and code compilation, feel free to try ;) Key Concern =========== If a decision on the acceptance or rejection of this PEP had to be made immediately, rejection would be far more likely. Unlike the previous major syntax addition to Python (PEP 343's ``with`` statement), this PEP as yet has no "killer application" of common code that is clearly and obviously improved through the use of the new syntax. The ``with`` statement (in conjunction with the generator enhancements in PEP 342) allowed exception handling to be factored out into context managers in a way that had never before been possible. Code using the new statement was not only easier to read, but much easier to write correctly in the first place. In the case of this PEP. however, the "Two Ways to Do It" objection is a strong one. While the ability to break out subexpresions of a statement without having to worry about name clashes with the rest of a function or script and without distracting from the operation that is the ultimate aim of the statement is potentially nice to have as a language feature, it doesn't really provide significant expressive power over and above what is already possible by assigning subexpressions to ordinary local variables before the statement of interest. In particular, explaining to new Python programmers when it is best to use a ``given`` clause and when to use normal local variables is likely to be challenging and an unnecessary distraction. "It might be kinda, sorta, nice to have, sometimes" really isn't a strong argument for a new syntactic construct (particularly one this complicated). "Status quo wins a stalemate" [5] is a very useful design principle, and I'm not yet convinced that this PEP clears that hurdle. The case for it has definitely strengthened over time though, which is why this PEP remains Deferred rather than Rejected. TO-DO ===== * Mention two-suite in-order variants (and explain why they're even more pointless than the specific idea in the PEP) * Mention PEP 359 and possible uses for locals() in the ``given`` clause References ========== .. [1] http://mail.python.org/pipermail/python-ideas/2010-June/007476.html .. [2] http://mail.python.org/pipermail/python-ideas/2010-July/007584.html .. [3] http://mail.python.org/pipermail/python-ideas/2009-July/005132.html .. [4] http://mail.python.org/pipermail/python-ideas/2010-July/007596.html .. [5] http://www.boredomandlaziness.org/2011/02/status-quo-wins-stalemate.html .. [6] http://mail.python.org/pipermail/python-ideas/2011-April/009863.html .. [7] http://mail.python.org/pipermail/python-ideas/2011-April/009869.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: