PEP: 227 Title: Statically Nested Scopes Version: $Revision$ Author: jeremy@digicool.com (Jeremy Hylton) Status: Draft Type: Standards Track Python-Version: 2.1 Created: 01-Nov-2000 Post-History: Abstract This PEP proposes the addition of statically nested scoping (lexical scoping) for Python 2.1. The current language definition defines exactly three namespaces that are used to resolve names -- the local, global, and built-in namespaces. The addition of nested scopes would allow resolution of unbound local names in enclosing functions' namespaces. One consequence of this change that will be most visible to Python programs is that lambda statements could reference variables in the namespaces where the lambda is defined. Currently, a lambda statement uses default arguments to explicitly creating bindings in the lambda's namespace. Specification Python is a statically scoped language with block structure, in the traditional of Algol. A code block or region, such as a module, class defintion, or function body, is the basic unit of a program. Names refer to objects. Names are introduced by name binding operations. Each occurrence of a name in the program text refers to the binding of that name established in the innermost function block containing the use. The name binding operations are assignment, class and function definition, and import statements. Each assignment or import statement occurs within a block defined by a class or function definition or at the module level (the top-level code block). If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. (Note: This can lead to errors when a name is used within a block before it is bound.) If the global statement occurs within a block, all uses of the name specified in the statement refer to the binding of that name in the top-level namespace. Names are resolved in the top-level namespace by searching the global namespace, the namespace of the module containing the code block, and the builtin namespace, the namespace of the module __builtin__. The global namespace is searched first. If the name is not found there, the builtin namespace is searched. If a name is used within a code block, but it is not bound there and is not declared global, the use is treated as a reference to the nearest enclosing function region. (Note: If a region is contained within a class definition, the name bindings that occur in the class block are not visible to enclosed functions.) A class definition is an executable statement that may uses and definitions of names. These references follow the normal rules for name resolution. The namespace of the class definition becomes the attribute dictionary of the class. Discussion This proposal changes the rules for resolving free variables in Python functions. The Python 2.0 definition specifies exactly three namespaces to check for each name -- the local namespace, the global namespace, and the builtin namespace. According to this defintion, if a function A is defined within a function B, the names bound in B are not visible in A. The proposal changes the rules so that names bound in B are visible in A (unless A contains a name binding that hides the binding in B). The specification introduces rules for lexical scoping that are common in Algol-like languages. The combination of lexical scoping and existing support for first-class functions is reminiscent of Scheme. The changed scoping rules address two problems -- the limited utility of lambda statements and the frequent confusion of new users familiar with other languages that support lexical scoping, e.g. the inability to define recursive functions except at the module level. The lambda statement introduces an unnamed function that contains a single statement. It is often used for callback functions. In the example below (written using the Python 2.0 rules), any name used in the body of the lambda must be explicitly passed as a default argument to the lambda. from Tkinter import * root = Tk() Button(root, text="Click here", command=lambda root=root: root.test.configure(text="...")) This approach is cumbersome, particularly when there are several names used in the body of the lambda. The long list of default arguments obscure the purpose of the code. The proposed solution, in crude terms, implements the default argument approach automatically. The "root=root" argument can be omitted. The specified rules allow names defined in a function to be referenced in any nested function defined with that function. The name resolution rules are typical for statically scoped languages, with three primary exceptions: - Names in class scope are not accessible. - The global statement short-circuits the normal rules. - Variables are not declared. Names in class scope are not accessible. Names are resolved in the innermost enclosing function scope. If a class defintion occurs in a chain of nested scopes, the resolution process skips class definitions. This rule prevents odd interactions between class attributes and local variable access. If a name binding operation occurs in a class defintion, it creates an attribute on the resulting class object. To access this variable in a method, or in a function nested within a method, an attribute reference must be used, either via self or via the class name. An alternative would have been to allow name binding in class scope to behave exactly like name binding in function scope. This rule would allow class attributes to be referenced either via attribute reference or simple name. This option was ruled out because it would have been inconsistent with all other forms of class and instance attribute access, which always use attribute references. Code that used simple names would have been obscure. The global statement short-circuits the normal rules. Under the proposal, the global statement has exactly the same effect that it does for Python 2.0. It's behavior is preserved for backwards compatibility. It is also noteworthy because it allows name binding operations performed in one block to change bindings in another block (the module). Variables are not declared. If a name binding operation occurs anywhere in a function, then that name is treated as local to the function and all references refer to the local binding. If a reference occurs before the name is bound, a NameError is raised. The only kind of declaration is the global statement, which allows programs to be written using mutable global variables. As a consequence, it is not possible to rebind a name defined in an enclosing scope. An assignment operation can only bind a name in the current scope or in the global scope. The lack of declarations and the inability to rebind names in enclosing scopes are unusual for lexically scoped languages; there is typically a mechanism to create name bindings (e.g. lambda and let in Scheme) and a mechanism to change the bindings (set! in Scheme). Examples A few examples are included to illustrate the way the rules work. >>> def make_fact(): ... def fact(n): ... if n == 1: ... return 1L ... else: ... return n * fact(n - 1) ... return fact >>> fact = make_fact() >>> fact(7) 5040L >>> def make_adder(base): ... def adder(x): ... return base + x ... return adder >>> add5 = make_adder(5) >>> add5(6) 11 >>> def make_wrapper(obj): ... class Wrapper: ... def __getattr__(self, attr): ... if attr[0] != '_': ... return getattr(obj, attr) ... else: ... raise AttributeError, attr ... return Wrapper() >>> class Test: ... public = 2 ... _private = 3 >>> w = make_wrapper(Test()) >>> w.public 2 >>> w._private Traceback (most recent call last): File "", line 1, in ? AttributeError: _private An example from Tim Peters of the potential pitfalls of nested scopes in the absence of declarations: i = 6 def f(x): def g(): print i # ... # skip to the next page # ... for i in x: # ah, i *is* local to f, so this is what g sees pass g() The call to g() will refer to the variable i bound in f() by the for loop. If g() is called before the loop is executed, a NameError will be raised. Other issues Backwards compatibility The proposed changes will break backwards compatibility for some code. The following example from Skip Montanaro illustrates: x = 1 def f1(): x = 2 def inner(): print x inner() Under the Python 2.0 rules, the print statement inside inner() refers to the global variable x and will print 1 if f1() is called. Under the new rules, it refers to the f1()'s namespace, the nearest enclosing scope with a binding. The problem occurs only when a global variable and a local variable share the same name and a nested function uses that name to refer to the global variable. This is poor programming practice, because readers will easily confuse the two different variables. To address this problem, which is unlikely to occur often, a static analysis tool that detects affected code will be written. The detection problem is straightfoward. locals() / vars() These functions return a dictionary containing the current scope's local variables. Modifications to the dictionary do not affect the values of variables. Under the current rules, the use of locals() and globals() allows the program to gain access to all the namespaces in which names are resolved. An analogous function will not be provided for nested scopes. Under this proposal, it will not be possible to gain dictionary-style access to all visible scopes. Rebinding names in enclosing scopes There are technical issues that make it difficult to support rebinding of names in enclosing scopes, but the primary reason that it is not allowed in the current proposal is that Guido is opposed to it. It is difficult to support, because it would require a new mechanism that would allow the programmer to specify that an assignment in a block is supposed to rebind the name in an enclosing block; presumably a keyword or special syntax (x := 3) would make this possible. The proposed rules allow programmers to achieve the effect of rebinding, albeit awkwardly. The name that will be effectively rebound by enclosed functions is bound to a container object. In place of assignment, the program uses modification of the container to achieve the desired effect: def bank_account(initial_balance): balance = [initial_balance] def deposit(amount): balance[0] = balance[0] + amount return balance def withdraw(amount): balance[0] = balance[0] - amount return balance return deposit, withdraw Support for rebinding in nested scopes would make this code clearer. A class that defines deposit() and withdraw() methods and the balance as an instance variable would be clearer still. Since classes seem to achieve the same effect in a more straightforward manner, they are preferred. Implementation An implementation effort is underway. The implementation requires a way to create closures, an object that combines a function's code and the environment in which to resolve free variables. There are a variety of implementation alternatives for closures. Two typical ones are nested closures and flat closures. Nested closures use a static link from a nested function to its enclosing environment. This implementation requires several links to be followed if there is more than one level of nesting and keeps many garbage objects alive longer than necessary. Flat closures are roughly similar to the default argument hack currently used for lambda support. Each function object would have a func_env slot that holds a tuple of free variable bindings. The code inside the function would use LOAD_ENV to access these bindings rather than the typical LOAD_FAST. The problem with this approach is that rebindings are not visible to the nested function. Consider the following example: import threading import time def outer(): x = 2 def inner(): while 1: print x time.sleep(1) threading.Thread(target=inner).start() while 1: x = x + 1 time.sleep(0.8) If the func_env slot is defined when MAKE_FUNCTION is called, then x in innner() is bound to the value of x in outer() at function definition time. This is the default argument hack, but not actual name resolution based on statically nested scopes. To support shared visibility of updates, it will be necessary to have a tuple of cells that contain references to variables. The extra level of indirection should allow updates to be shared. It is not clear whether the current 1-pass Python compiler can determine which references are to globals and which are references to enclosing scopes. It may be possible to make minimal changes that defers the optimize() call until a second pass, after scopes have been determined. Local Variables: mode: indented-text indent-tabs-mode: nil End: