PEP: 3150 Title: Statement local namespaces (aka "given" clause) Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Withdrawn Type: Standards Track Content-Type: text/x-rst Created: 2010-07-09 Python-Version: 3.4 Post-History: 2010-07-14, 2011-04-21, 2011-06-13 Resolution: TBD Abstract ======== This PEP proposes the addition of an optional ``given`` clause to several Python statements that do not currently have an associated code suite. This clause will create a statement local namespace for additional names that are accessible in the associated statement, but do not become part of the containing namespace. The primary motivation is to enable a more declarative style of programming, where the operation to be performed is presented to the reader first, and the details of the necessary subcalculations are presented in the following indented suite. As a key example, this would elevate ordinary assignment statements to be on par with ``class`` and ``def`` statements where the name of the item to be defined is presented to the reader in advance of the details of how the value of that item is calculated. It also allows named functions to be used in a "multi-line lambda" fashion, where the name is used solely as a placeholder in the current expression and then defined in the following suite. A secondary motivation is to simplify interim calculations in module and class level code without polluting the resulting namespaces. The intent is that the relationship between a given clause and a separate function definition that performs the specified operation will be similar to the existing relationship between an explicit while loop and a generator that produces the same sequence of operations as that while loop. The specific proposal in this PEP has been informed by various explorations of this and related concepts over the years (e.g. [1]_, [2]_, [3]_, [6]_, [8]_), and is inspired to some degree by the ``where`` and ``let`` clauses in Haskell. It avoids some problems that have been identified in past proposals, but has not yet itself been subject to the test of implementation. PEP Withdrawal ============== I've had a complicated history with this PEP. For a long time I left it in the Deferred state because I wasn't convinced the additional complexity was worth the payoff. Then, briefly, I became more enamoured of the idea and only left it at Deferred because I didn't really have time to pursue it. I'm now withdrawing it, as, the longer I reflect on the topic, the more I feel this approach is simply far too intrusive and complicated to ever be a good idea for Python as a language. I've also finally found a couple of syntax proposals for PEP 403 that read quite nicely and address the same set of use cases as this PEP while remaining significantly simpler. Proposal ======== This PEP proposes the addition of an optional ``given`` clause to the syntax for simple statements which may contain an expression, or may substitute for such a statement for purely syntactic purposes. The current list of simple statements that would be affected by this addition is as follows: * expression statement * assignment statement * augmented assignment statement * del statement * return statement * yield statement * raise statement * assert statement * pass statement The ``given`` clause would allow subexpressions to be referenced by name in the header line, with the actual definitions following in the indented clause. As a simple example:: sorted_data = sorted(data, key=sort_key) given: def sort_key(item): return item.attr1, item.attr2 The ``pass`` statement is included to provide a consistent way to skip inclusion of a meaningful expression in the header line. While this is not an intended use case, it isn't one that can be prevented as multiple alternatives (such as ``...`` and ``()``) remain available even if ``pass`` itself is disallowed. Rationale ========= Function and class statements in Python have a unique property relative to ordinary assignment statements: to some degree, they are *declarative*. They present the reader of the code with some critical information about a name that is about to be defined, before proceeding on with the details of the actual definition in the function or class body. The *name* of the object being declared is the first thing stated after the keyword. Other important information is also given the honour of preceding the implementation details: - decorators (which can greatly affect the behaviour of the created object, and were placed ahead of even the keyword and name as a matter of practicality moreso than aesthetics) - the docstring (on the first line immediately following the header line) - parameters, default values and annotations for function definitions - parent classes, metaclass and optionally other details (depending on the metaclass) for class definitions This PEP proposes to make a similar declarative style available for arbitrary assignment operations, by permitting the inclusion of a "given" suite following any simple assignment statement:: TARGET = [TARGET2 = ... TARGETN =] EXPR given: SUITE By convention, code in the body of the suite should be oriented solely towards correctly defining the assignment operation carried out in the header line. The header line operation should also be adequately descriptive (e.g. through appropriate choices of variable names) to give a reader a reasonable idea of the purpose of the operation without reading the body of the suite. However, while they are the initial motivating use case, limiting this feature solely to simple assignments would be overly restrictive. Once the feature is defined at all, it would be quite arbitrary to prevent its use for augmented assignments, return statements, yield expressions and arbitrary expressions that may modify the application state. The ``given`` clause may also function as a more readable alternative to some uses of lambda expressions and similar constructs when passing one-off functions to operations like ``sorted()``. In module and class level code, the ``given`` clause will serve as a clear and reliable replacement for usage of the ``del`` statement to keep interim working variables from polluting the resulting namespace. One potentially useful way to think of the proposed clause is as a middle ground between conventional in-line code and separation of an operation out into a dedicated function, just as an inline while loop may eventually be factored out into a dedicator generator. Keyword Choice ============== This proposal initially used ``where`` based on the name of a similar construct in Haskell. However, it has been pointed out that there are existing Python libraries (such as Numpy [4]_) that already use ``where`` in the SQL query condition sense, making that keyword choice potentially confusing. While ``given`` may also be used as a variable name (and hence would be deprecated using the usual ``__future__`` dance for introducing new keywords), it is associated much more strongly with the desired "here are some extra variables this expression may use" semantics for the new clause. Reusing the ``with`` keyword has also been proposed. This has the advantage of avoiding the addition of a new keyword, but also has a high potential for confusion as the ``with`` clause and ``with`` statement would look similar but do completely different things. That way lies C++ and Perl :) Anticipated Objections ====================== Two Ways To Do It ----------------- A lot of code may now be written with values defined either before the expression where they are used or afterwards in a ``given`` clause, creating two ways to do it, perhaps without an obvious way of choosing between them. On reflection, I feel this is a misapplication of the "one obvious way" aphorism. Python already offers *lots* of ways to write code. We can use a for loop or a while loop, a functional style or an imperative style or an object oriented style. The language, in general, is designed to let people write code that matches the way they think. Since different people think differently, the way they write their code will change accordingly. Such stylistic questions in a code base are rightly left to the development group responsible for that code. When does an expression get so complicated that the subexpressions should be taken out and assigned to variables, even though those variables are only going to be used once? When should an inline while loop be replaced with a generator that implements the same logic? Opinions differ, and that's OK. However, explicit PEP 8 guidance will be needed for CPython and the standard library, and that is discussed below. Out of Order Execution ---------------------- The ``given`` clause makes execution jump around a little strangely, as the body of the ``given`` clause is executed before the simple statement in the clause header. The closest any other part of Python comes to this is the out of order evaluation in list comprehensions, generator expressions and conditional expressions and the delayed application of decorator functions to the function they decorate (the decorator expressions themselves are executed in the order they are written). While this is true, the syntax is intended for cases where people are themselves *thinking* about a problem out of sequence (at least as far as the language is concerned). As an example of this, consider the following thought in the mind of a Python user: I want to sort the items in this sequence according to the values of attr1 and attr2 on each item. If they're comfortable with Python's ``lambda`` expressions, then they might choose to write it like this:: sorted_list = sorted(original, key=(lambda v: v.attr1, v.attr2)) That gets the job done, but it hardly reaches the standard of ``executable pseudocode`` that Python aspires to, does it? If they don't like ``lambda`` specifically, the ``operator`` module offers an alternative that still allows the key function to be defined inline:: sorted_list = sorted(original, key=operator.attrgetter(v. 'attr1', 'attr2')) Again, it gets the job done, but executable pseudocode it ain't. If they think both of the above options are ugly and confusing, or they need logic in their key function that can't be expressed as an expression (such as catching an exception), then Python currently forces them to reverse the order of their original thought and define the sorting criteria first:: def sort_key(item): return item.attr1, item.attr2 sorted_list = sorted(original, key=sort_key) "Just define a function" has been the rote response to requests for multi-line lambda support for years. As with the above options, it gets the job done, but it really does represent a break between what the user is thinking and what the language allows them to express. I believe the proposal in this PEP will finally let Python get close to the "executable pseudocode" bar for the kind of thought expressed above:: sorted_list = sorted(original, key=sort_key) given: def sort_key(item): return item.attr1, item.attr2 Everything is in the same order as it was in the user's original thought, the only addition they have to make is to give the sorting criteria a name so that the usage can be linked up to the subsequent definition. One other useful note on this front, is that this PEP allows existing out of order execution constructs to be described as special cases of the more general out of order execution syntax (just as comprehensions are now special cases of the more general generator expression syntax, even though list comprehensions existed first):: @classmethod def classname(cls): return cls.__name__ Would be roughly equivalent to:: classname = f1(classname) given: f1 = classmethod def classname(cls): return cls.__name__ A list comprehension like ``squares = [x*x for x in range(10)]`` would be equivalent to:: # Note: this example uses an explicit early binding variant that # isn't yet reflected in the rest of the PEP. It will get there, though. squares = seq given outermost=range(10): seq = [] for x in outermost: seq.append(x*x) Harmful to Introspection ------------------------ Poking around in module and class internals is an invaluable tool for white-box testing and interactive debugging. The ``given`` clause will be quite effective at preventing access to temporary state used during calculations (although no more so than current usage of ``del`` statements in that regard). While this is a valid concern, design for testability is an issue that cuts across many aspects of programming. If a component needs to be tested independently, then a ``given`` statement should be refactored in to separate statements so that information is exposed to the test suite. This isn't significantly different from refactoring an operation hidden inside a function or generator out into its own function purely to allow it to be tested in isolation. Lack of Real World Impact Assessment ------------------------------------ The examples in the current PEP are almost all relatively small "toy" examples. The proposal in this PEP needs to be subjected to the test of application to a large code base (such as the standard library or a large Twisted application) in a search for examples where the readability of real world code is genuinely enhanced. This is more of a deficiency in the PEP rather than the idea, though. New PEP 8 Guidelines ==================== As discussed on python-ideas ([7]_, [9]_) new PEP 8 guidelines would also need to be developed to provide appropriate direction on when to use the ``given`` clause over ordinary variable assignments. Based on the similar guidelines already present for ``try`` statements, this PEP proposes the following additions for ``given`` statements to the "Programming Conventions" section of PEP 8: - for code that could reasonably be factored out into a separate function, but is not currently reused anywhere, consider using a ``given`` clause. This clearly indicates which variables are being used only to define subcomponents of another statement rather than to hold algorithm or application state. - keep ``given`` clauses concise. If they become unwieldy, either break them up into multiple steps or else move the details into a separate function. Syntax Change ============= Current:: expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) | ('=' (yield_expr|testlist_star_expr))*) del_stmt: 'del' exprlist pass_stmt: 'pass' return_stmt: 'return' [testlist] yield_stmt: yield_expr raise_stmt: 'raise' [test ['from' test]] assert_stmt: 'assert' test [',' test] New:: expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) | ('=' (yield_expr|testlist_star_expr))*) [given_clause] del_stmt: 'del' exprlist [given_clause] pass_stmt: 'pass' [given_clause] return_stmt: 'return' [testlist] [given_clause] yield_stmt: yield_expr [given_clause] raise_stmt: 'raise' [test ['from' test]] [given_clause] assert_stmt: 'assert' test [',' test] [given_clause] given_clause: "given" ":" suite (Note that ``expr_stmt`` in the grammar is a slight misnomer, as it covers assignment and augmented assignment in addition to simple expression statements) The new clause is added as an optional element of the existing statements rather than as a new kind of compound statement in order to avoid creating an ambiguity in the grammar. It is applied only to the specific elements listed so that nonsense like the following is disallowed:: break given: a = b = 1 import sys given: a = b = 1 However, the precise Grammar change described above is inadequate, as it creates problems for the definition of simple_stmt (which allows chaining of multiple single line statements with ";" rather than "\\n"). So the above syntax change should instead be taken as a statement of intent. Any actual proposal would need to resolve the simple_stmt parsing problem before it could be seriously considered. This would likely require a non-trivial restructuring of the grammar, breaking up small_stmt and flow_stmt to separate the statements that potentially contain arbitrary subexpressions and then allowing a single one of those statements with a ``given`` clause at the simple_stmt level. Something along the lines of:: stmt: simple_stmt | given_stmt | compound_stmt simple_stmt: small_stmt (';' (small_stmt | subexpr_stmt))* [';'] NEWLINE small_stmt: (pass_stmt | flow_stmt | import_stmt | global_stmt | nonlocal_stmt) flow_stmt: break_stmt | continue_stmt given_stmt: subexpr_stmt (given_clause | (';' (small_stmt | subexpr_stmt))* [';']) NEWLINE subexpr_stmt: expr_stmt | del_stmt | flow_subexpr_stmt | assert_stmt flow_subexpr_stmt: return_stmt | raise_stmt | yield_stmt given_clause: "given" ":" suite For reference, here are the current definitions at that level:: stmt: simple_stmt | compound_stmt simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | nonlocal_stmt | assert_stmt) flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt Possible Implementation Strategy ================================ Torture Test ------------ An implementation of this PEP should support execution of the following code at module, class and function scope:: b = {} a = b[f(a)] = x given: x = 42 def f(x): return x assert "x" not in locals() assert "f" not in locals() assert a == 42 assert d[42] == 42 given: d = b assert "d" not in locals() y = y given: x = 42 def f(): pass y = locals("x"), f.__name__ assert "x" not in locals() assert "f" not in locals() assert y == (42, "f") Most naive implementations will choke on the first complex assignment, while less naive but still broken implementations will fail when the torture test is executed at class scope. Renaming based strategies struggle to support ``locals()`` correctly and also have problems with class and function ``__name__`` attributes. And yes, that's a perfectly well-defined assignment statement. Insane, you might rightly say, but legal:: >>> def f(x): return x ... >>> x = 42 >>> b = {} >>> a = b[f(a)] = x >>> a 42 >>> b {42: 42} Details of Proposed Semantics ----------------------------- AKA How Class Scopes Screw You When Attempting To Implement This The natural idea when setting out to implement this concept is to use an ordinary nested function scope. This doesn't work for the two reasons mentioned in the Torture Test section above: * Non-local variables are not your friend because they ignore class scopes and (when writing back to the outer scope) aren't really on speaking terms with module scopes either. * Return-based semantics struggle with complex assignment statements like the one in the torture test The second thought is generally some kind of hidden renaming strategy. This also creates problems, as Python exposes variables names via the ``locals()`` dictionary and class and function ``__name__`` attributes. The most promising approach is one based on symtable analysis and copy-in-copy-out referencing semantics to move any required name bindings between the inner and outer scopes. The torture test above would then translate to something like the following:: b = {} def _anon1(b): # 'b' reference copied in x = 42 def f(x): return x a = b[f(a)] = x return a # 'a' reference copied out a = _anon1(b) assert "x" not in locals() assert "f" not in locals() assert a == 42 def _anon2(b) # 'b' reference copied in d = b assert d[42] == 42 # Nothing to copy out (not an assignment) _anon2() assert "d" not in locals() def _anon3() # Nothing to copy in (no references to other variables) x = 42 def f(): pass y = locals("x"), f.__name__ y = y # Assuming no optimisation of special cases return y # 'y' reference copied out y = _anon3() assert "x" not in locals() assert "f" not in locals() assert y == (42, "f") However, as noted in the abstract, an actual implementation of this idea has never been tried. Detailed Semantics #1: Early Binding of Variable References ----------------------------------------------------------- The copy-in-copy-out semantics mean that all variable references from a ``given`` clause will exhibit early binding behaviour, in contrast to the late binding typically seen with references to closure variables and globals in ordinary functions. This behaviour will allow the ``given`` clause to be used as a substitute for the default argument hack when early binding behaviour is desired:: # Current Python (late binding) seq = [] for i in range(10): def f(): return i seq.append(f) assert [f() for f in seq] == [9]*10 # Current Python (early binding via default argument hack) seq = [] for i in range(10): def f(_i=i): return i seq.append(f) assert [f() for f in seq] == list(range(10)) # Early binding via given clause seq = [] for i in range(10): seq.append(f) given: def f(): return i assert [f() for f in seq] == list(range(10)) Note that the current intention is for the copy-in/copy-out semantics to apply only to names defined in the local scope containing the ``given`` clause. Name in outer scopes will be referenced as normal. This intention is subject to revision based on feedback and practicalities of implementation. Detailed Semantics #2: Handling of ``nonlocal`` and ``global`` -------------------------------------------------------------- ``nonlocal`` and ``global`` will largely operate as if the anonymous functions were defined as in the expansion above. However, they will also override the default early-binding semantics for names from the containing scope. This intention is subject to revision based on feedback and practicalities of implementation. Detailed Semantics #3: Handling of ``break`` and ``continue`` ------------------------------------------------------------- ``break`` and ``continue`` will operate as if the anonymous functions were defined as in the expansion above. They will be syntax errors if they occur in the ``given`` clause suite but will work normally if they appear within a ``for`` or ``while`` loop as part of that suite. Detailed Semantics #4: Handling of ``return`` and ``yield`` ------------------------------------------------------------- ``return`` and ``yield`` are explicitly disallowed in the ``given`` clause suite and will be syntax errors if they occur. They will work normally if they appear within a ``def`` statement within that suite. Alternative Semantics for Name Binding -------------------------------------- The "early binding" semantics proposed for the ``given`` clause are driven by the desire to have ``given`` clauses work "normally" in class scopes (that is, allowing them to see the local variables in the class, even though classes do not participate in normal lexical scoping). There is an alternative, which is to simply declare that the ``given`` clause creates an ordinary nested scope, just like comprehensions and generator expressions. Thus, the given clause would share the same quirks as those constructs: they exhibit surprising behaviour at class scope, since they can't see the local variables in the class definition. While this behaviour is considered desirable for method definitions (where class variables are accessed via the class or instance argument passed to the method), it can be surprising and inconvenient for implicit scopes that are designed to hide their own name bindings from the containing scope rather than vice-versa. A third alternative, more analogous to the comprehension case (where the outermost iterator expression is evaluated in the current scope and hence can see class locals normally), would be to allow *explicit* early binding in the ``given`` clause, by passing an optional tuple of assignments after the ``given`` keyword:: # Explicit early binding via given clause seq = [] for i in range(10): seq.append(f) given i=i: def f(): return i assert [f() for f in seq] == list(range(10)) (Note: I actually like the explicit early binding idea significantly more than I do the implicit early binding - expect a future version of the PEP to be updated accordingly. I've already used it above when describing how an existing construct like a list comprehension could be expressed as a special case of the new syntax) Examples ======== Defining "one-off" classes which typically only have a single instance:: # Current Python (instantiation after definition) class public_name(): ... # However many lines public_name = public_name(*params) # Current Python (custom decorator) def singleton(*args, **kwds): def decorator(cls): return cls(*args, **kwds) return decorator @singleton(*params) class public_name(): ... # However many lines public_name = public_name(*params) # Becomes: public_name = MeaningfulClassName(*params) given: class MeaningfulClassName(): ... # Should trawl the stdlib for an example of doing this Calculating attributes without polluting the local namespace (from os.py):: # Current Python (manual namespace cleanup) def _createenviron(): ... # 27 line function environ = _createenviron() del _createenviron # Becomes: environ = _createenviron() given: def _createenviron(): ... # 27 line function Replacing default argument hack (from functools.lru_cache):: # Current Python (default argument hack) def decorating_function(user_function, tuple=tuple, sorted=sorted, len=len, KeyError=KeyError): ... # 60 line function return decorating_function # Becomes: return decorating_function given: # Cell variables rather than locals, but should give similar speedup tuple, sorted, len, KeyError = tuple, sorted, len, KeyError def decorating_function(user_function): ... # 60 line function # This example also nicely makes it clear that there is nothing in the # function after the nested function definition. Due to additional # nested functions, that isn't entirely clear in the current code. Possible Additions ================== * The current proposal allows the addition of a ``given`` clause only for simple statements. Extending the idea to allow the use of compound statements would be quite possible (by appending the given clause as an independent suite at the end), but doing so raises serious readability concerns (as values defined in the ``given`` clause may be used well before they are defined, exactly the kind of readability trap that other features like decorators and ``with`` statements are designed to eliminate) * The "explicit early binding" variant may be applicable to the discussions on python-ideas on how to eliminate the default argument hack. A ``given`` clause in the header line for functions may be the answer to that question. Reference Implementation ======================== None as yet. If you want a crash course in Python namespace semantics and code compilation, feel free to try ;) TO-DO ===== * Mention two-suite in-order variants (and explain why they're even more pointless than the specific idea in the PEP) * Mention PEP 359 and possible uses for locals() in the ``given`` clause References ========== .. [1] Explicitation lines in Python: http://mail.python.org/pipermail/python-ideas/2010-June/007476.html .. [2] 'where' statement in Python: http://mail.python.org/pipermail/python-ideas/2010-July/007584.html .. [3] Where-statement (Proposal for function expressions): http://mail.python.org/pipermail/python-ideas/2009-July/005132.html .. [4] Name conflict with NumPy for 'where' keyword choice: http://mail.python.org/pipermail/python-ideas/2010-July/007596.html .. [5] The "Status quo wins a stalemate" design principle: http://www.boredomandlaziness.org/2011/02/status-quo-wins-stalemate.html .. [6] Assignments in list/generator expressions: http://mail.python.org/pipermail/python-ideas/2011-April/009863.html .. [7] Possible PEP 3150 style guidelines (#1): http://mail.python.org/pipermail/python-ideas/2011-April/009869.html .. [8] Discussion of PEP 403 (statement local function definition): http://mail.python.org/pipermail/python-ideas/2011-October/012276.html .. [9] Possible PEP 3150 style guidelines (#2): http://mail.python.org/pipermail/python-ideas/2011-October/012341.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: