python-peps/pep-3150.txt

775 lines
29 KiB
Plaintext
Raw Normal View History

PEP: 3150
Title: Statement local namespaces (aka "given" clause)
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
2012-02-21 10:13:16 -05:00
Status: Withdrawn
Type: Standards Track
Content-Type: text/x-rst
Created: 2010-07-09
Python-Version: 3.4
Post-History: 2010-07-14, 2011-04-21, 2011-06-13
Resolution: TBD
Abstract
========
This PEP proposes the addition of an optional ``given`` clause to several
Python statements that do not currently have an associated code suite. This
clause will create a statement local namespace for additional names that are
accessible in the associated statement, but do not become part of the
containing namespace.
The primary motivation is to enable a more declarative style of programming,
where the operation to be performed is presented to the reader first, and the
details of the necessary subcalculations are presented in the following
indented suite. As a key example, this would elevate ordinary assignment
statements to be on par with ``class`` and ``def`` statements where the name
of the item to be defined is presented to the reader in advance of the
details of how the value of that item is calculated. It also allows named
functions to be used in a "multi-line lambda" fashion, where the name is used
solely as a placeholder in the current expression and then defined in the
following suite.
A secondary motivation is to simplify interim calculations in module and
class level code without polluting the resulting namespaces.
The intent is that the relationship between a given clause and a separate
function definition that performs the specified operation will be similar to
the existing relationship between an explicit while loop and a generator that
produces the same sequence of operations as that while loop.
The specific proposal in this PEP has been informed by various explorations
of this and related concepts over the years (e.g. [1]_, [2]_, [3]_, [6]_,
[8]_), and is inspired to some degree by the ``where`` and ``let`` clauses in
Haskell. It avoids some problems that have been identified in past proposals,
but has not yet itself been subject to the test of implementation.
PEP Withdrawal
==============
I've had a complicated history with this PEP. For a long time I left it in
the Deferred state because I wasn't convinced the additional complexity was
worth the payoff. Then, briefly, I became more enamoured of the idea and
only left it at Deferred because I didn't really have time to pursue it.
I'm now withdrawing it, as, the longer I reflect on the topic, the more I
feel this approach is simply far too intrusive and complicated to ever be
a good idea for Python as a language.
I've also finally found a couple of syntax proposals for PEP 403 that
read quite nicely and address the same set of use cases as this PEP
while remaining significantly simpler.
Proposal
========
This PEP proposes the addition of an optional ``given`` clause to the
syntax for simple statements which may contain an expression, or may
substitute for such a statement for purely syntactic purposes. The
current list of simple statements that would be affected by this
addition is as follows:
* expression statement
* assignment statement
* augmented assignment statement
* del statement
* return statement
* yield statement
* raise statement
* assert statement
* pass statement
The ``given`` clause would allow subexpressions to be referenced by
name in the header line, with the actual definitions following in
the indented clause. As a simple example::
sorted_data = sorted(data, key=sort_key) given:
def sort_key(item):
return item.attr1, item.attr2
The ``pass`` statement is included to provide a consistent way to skip
inclusion of a meaningful expression in the header line. While this is not
an intended use case, it isn't one that can be prevented as multiple
alternatives (such as ``...`` and ``()``) remain available even if ``pass``
itself is disallowed.
Rationale
=========
Function and class statements in Python have a unique property
relative to ordinary assignment statements: to some degree, they are
*declarative*. They present the reader of the code with some critical
information about a name that is about to be defined, before
proceeding on with the details of the actual definition in the
function or class body.
The *name* of the object being declared is the first thing stated
after the keyword. Other important information is also given the
honour of preceding the implementation details:
- decorators (which can greatly affect the behaviour of the created
object, and were placed ahead of even the keyword and name as a matter
of practicality moreso than aesthetics)
- the docstring (on the first line immediately following the header line)
- parameters, default values and annotations for function definitions
- parent classes, metaclass and optionally other details (depending on
the metaclass) for class definitions
This PEP proposes to make a similar declarative style available for
arbitrary assignment operations, by permitting the inclusion of a
"given" suite following any simple assignment statement::
TARGET = [TARGET2 = ... TARGETN =] EXPR given:
SUITE
By convention, code in the body of the suite should be oriented solely
towards correctly defining the assignment operation carried out in the
header line. The header line operation should also be adequately
descriptive (e.g. through appropriate choices of variable names) to
give a reader a reasonable idea of the purpose of the operation
without reading the body of the suite.
However, while they are the initial motivating use case, limiting this
feature solely to simple assignments would be overly restrictive. Once the
feature is defined at all, it would be quite arbitrary to prevent its use
for augmented assignments, return statements, yield expressions and
arbitrary expressions that may modify the application state.
The ``given`` clause may also function as a more readable
alternative to some uses of lambda expressions and similar
constructs when passing one-off functions to operations
like ``sorted()``.
In module and class level code, the ``given`` clause will serve as a
clear and reliable replacement for usage of the ``del`` statement to keep
interim working variables from polluting the resulting namespace.
One potentially useful way to think of the proposed clause is as a middle
ground between conventional in-line code and separation of an
operation out into a dedicated function, just as an inline while loop may
eventually be factored out into a dedicator generator.
Keyword Choice
==============
This proposal initially used ``where`` based on the name of a similar
construct in Haskell. However, it has been pointed out that there
are existing Python libraries (such as Numpy [4]_) that already use
``where`` in the SQL query condition sense, making that keyword choice
potentially confusing.
While ``given`` may also be used as a variable name (and hence would be
deprecated using the usual ``__future__`` dance for introducing
new keywords), it is associated much more strongly with the desired
"here are some extra variables this expression may use" semantics
for the new clause.
Reusing the ``with`` keyword has also been proposed. This has the
advantage of avoiding the addition of a new keyword, but also has
a high potential for confusion as the ``with`` clause and ``with``
statement would look similar but do completely different things.
That way lies C++ and Perl :)
Anticipated Objections
======================
Two Ways To Do It
-----------------
A lot of code may now be written with values defined either before the
expression where they are used or afterwards in a ``given`` clause, creating
two ways to do it, perhaps without an obvious way of choosing between them.
On reflection, I feel this is a misapplication of the "one obvious way"
aphorism. Python already offers *lots* of ways to write code. We can use
a for loop or a while loop, a functional style or an imperative style or an
object oriented style. The language, in general, is designed to let people
write code that matches the way they think. Since different people think
differently, the way they write their code will change accordingly.
Such stylistic questions in a code base are rightly left to the development
group responsible for that code. When does an expression get so complicated
that the subexpressions should be taken out and assigned to variables, even
though those variables are only going to be used once? When should an inline
while loop be replaced with a generator that implements the same logic?
Opinions differ, and that's OK.
However, explicit PEP 8 guidance will be needed for CPython and the standard
library, and that is discussed below.
Out of Order Execution
----------------------
The ``given`` clause makes execution jump around a little strangely, as the
body of the ``given`` clause is executed before the simple statement in the
clause header. The closest any other part of Python comes to this is the out
of order evaluation in list comprehensions, generator expressions and
conditional expressions and the delayed application of decorator functions to
the function they decorate (the decorator expressions themselves are executed
in the order they are written).
While this is true, the syntax is intended for cases where people are
themselves *thinking* about a problem out of sequence (at least as far as
the language is concerned). As an example of this, consider the following
thought in the mind of a Python user:
I want to sort the items in this sequence according to the values of
attr1 and attr2 on each item.
If they're comfortable with Python's ``lambda`` expressions, then they might
choose to write it like this::
sorted_list = sorted(original, key=(lambda v: v.attr1, v.attr2))
That gets the job done, but it hardly reaches the standard of ``executable
pseudocode`` that Python aspires to, does it?
If they don't like ``lambda`` specifically, the ``operator`` module offers an
alternative that still allows the key function to be defined inline::
sorted_list = sorted(original,
key=operator.attrgetter(v. 'attr1', 'attr2'))
Again, it gets the job done, but executable pseudocode it ain't.
If they think both of the above options are ugly and confusing, or they need
logic in their key function that can't be expressed as an expression (such
as catching an exception), then Python currently forces them to reverse the
order of their original thought and define the sorting criteria first::
def sort_key(item):
return item.attr1, item.attr2
sorted_list = sorted(original, key=sort_key)
"Just define a function" has been the rote response to requests for multi-line
lambda support for years. As with the above options, it gets the job done,
but it really does represent a break between what the user is thinking and
what the language allows them to express.
I believe the proposal in this PEP will finally let Python get close to the
"executable pseudocode" bar for the kind of thought expressed above::
sorted_list = sorted(original, key=sort_key) given:
def sort_key(item):
return item.attr1, item.attr2
Everything is in the same order as it was in the user's original thought, the
only addition they have to make is to give the sorting criteria a name so that
the usage can be linked up to the subsequent definition.
One other useful note on this front, is that this PEP allows existing out of
order execution constructs to be described as special cases of the more
general out of order execution syntax (just as comprehensions are now special
cases of the more general generator expression syntax, even though list
comprehensions existed first)::
@classmethod
def classname(cls):
return cls.__name__
Would be roughly equivalent to::
classname = f1(classname) given:
f1 = classmethod
def classname(cls):
return cls.__name__
A list comprehension like ``squares = [x*x for x in range(10)]``
would be equivalent to::
# Note: this example uses an explicit early binding variant that
# isn't yet reflected in the rest of the PEP. It will get there, though.
squares = seq given outermost=range(10):
seq = []
for x in outermost:
seq.append(x*x)
Harmful to Introspection
------------------------
Poking around in module and class internals is an invaluable tool for
white-box testing and interactive debugging. The ``given`` clause will be
quite effective at preventing access to temporary state used during
calculations (although no more so than current usage of ``del`` statements
in that regard).
While this is a valid concern, design for testability is an issue that
cuts across many aspects of programming. If a component needs to be tested
independently, then a ``given`` statement should be refactored in to separate
statements so that information is exposed to the test suite. This isn't
significantly different from refactoring an operation hidden inside a
function or generator out into its own function purely to allow it to be
tested in isolation.
Lack of Real World Impact Assessment
------------------------------------
The examples in the current PEP are almost all relatively small "toy"
examples. The proposal in this PEP needs to be subjected to the test of
application to a large code base (such as the standard library or a large
Twisted application) in a search for examples where the readability of real
world code is genuinely enhanced.
This is more of a deficiency in the PEP rather than the idea, though.
New PEP 8 Guidelines
====================
As discussed on python-ideas ([7]_, [9]_) new PEP 8 guidelines would also
need to be developed to provide appropriate direction on when to use the
``given`` clause over ordinary variable assignments.
Based on the similar guidelines already present for ``try`` statements, this
PEP proposes the following additions for ``given`` statements to the
"Programming Conventions" section of PEP 8:
- for code that could reasonably be factored out into a separate function,
but is not currently reused anywhere, consider using a ``given`` clause.
This clearly indicates which variables are being used only to define
subcomponents of another statement rather than to hold algorithm or
application state.
- keep ``given`` clauses concise. If they become unwieldy, either break
them up into multiple steps or else move the details into a separate
function.
Syntax Change
=============
Current::
expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
('=' (yield_expr|testlist_star_expr))*)
del_stmt: 'del' exprlist
pass_stmt: 'pass'
return_stmt: 'return' [testlist]
yield_stmt: yield_expr
raise_stmt: 'raise' [test ['from' test]]
assert_stmt: 'assert' test [',' test]
New::
expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
('=' (yield_expr|testlist_star_expr))*) [given_clause]
del_stmt: 'del' exprlist [given_clause]
pass_stmt: 'pass' [given_clause]
return_stmt: 'return' [testlist] [given_clause]
yield_stmt: yield_expr [given_clause]
raise_stmt: 'raise' [test ['from' test]] [given_clause]
assert_stmt: 'assert' test [',' test] [given_clause]
given_clause: "given" ":" suite
(Note that ``expr_stmt`` in the grammar is a slight misnomer, as it covers
assignment and augmented assignment in addition to simple expression
statements)
The new clause is added as an optional element of the existing statements
rather than as a new kind of compound statement in order to avoid creating
an ambiguity in the grammar. It is applied only to the specific elements
listed so that nonsense like the following is disallowed::
break given:
a = b = 1
import sys given:
a = b = 1
However, the precise Grammar change described above is inadequate, as it
creates problems for the definition of simple_stmt (which allows chaining of
multiple single line statements with ";" rather than "\\n").
So the above syntax change should instead be taken as a statement of intent.
Any actual proposal would need to resolve the simple_stmt parsing problem
before it could be seriously considered. This would likely require a
non-trivial restructuring of the grammar, breaking up small_stmt and
flow_stmt to separate the statements that potentially contain arbitrary
subexpressions and then allowing a single one of those statements with
a ``given`` clause at the simple_stmt level. Something along the lines of::
stmt: simple_stmt | given_stmt | compound_stmt
simple_stmt: small_stmt (';' (small_stmt | subexpr_stmt))* [';'] NEWLINE
small_stmt: (pass_stmt | flow_stmt | import_stmt |
global_stmt | nonlocal_stmt)
flow_stmt: break_stmt | continue_stmt
given_stmt: subexpr_stmt (given_clause |
(';' (small_stmt | subexpr_stmt))* [';']) NEWLINE
subexpr_stmt: expr_stmt | del_stmt | flow_subexpr_stmt | assert_stmt
flow_subexpr_stmt: return_stmt | raise_stmt | yield_stmt
given_clause: "given" ":" suite
For reference, here are the current definitions at that level::
stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | nonlocal_stmt | assert_stmt)
flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
Possible Implementation Strategy
================================
Torture Test
------------
An implementation of this PEP should support execution of the following
code at module, class and function scope::
b = {}
a = b[f(a)] = x given:
x = 42
def f(x):
return x
assert "x" not in locals()
assert "f" not in locals()
assert a == 42
assert d[42] == 42 given:
d = b
assert "d" not in locals()
y = y given:
x = 42
def f(): pass
y = locals("x"), f.__name__
assert "x" not in locals()
assert "f" not in locals()
assert y == (42, "f")
Most naive implementations will choke on the first complex assignment,
while less naive but still broken implementations will fail when
the torture test is executed at class scope. Renaming based strategies
struggle to support ``locals()`` correctly and also have problems with
class and function ``__name__`` attributes.
And yes, that's a perfectly well-defined assignment statement. Insane,
you might rightly say, but legal::
>>> def f(x): return x
...
>>> x = 42
>>> b = {}
>>> a = b[f(a)] = x
>>> a
42
>>> b
{42: 42}
Details of Proposed Semantics
-----------------------------
AKA How Class Scopes Screw You When Attempting To Implement This
The natural idea when setting out to implement this concept is to
use an ordinary nested function scope. This doesn't work for the
two reasons mentioned in the Torture Test section above:
* Non-local variables are not your friend because they ignore class scopes
and (when writing back to the outer scope) aren't really on speaking
terms with module scopes either.
* Return-based semantics struggle with complex assignment statements
like the one in the torture test
The second thought is generally some kind of hidden renaming strategy. This
also creates problems, as Python exposes variables names via the ``locals()``
dictionary and class and function ``__name__`` attributes.
The most promising approach is one based on symtable analysis and
copy-in-copy-out referencing semantics to move any required name
bindings between the inner and outer scopes. The torture test above
would then translate to something like the following::
b = {}
def _anon1(b): # 'b' reference copied in
x = 42
def f(x):
return x
a = b[f(a)] = x
return a # 'a' reference copied out
a = _anon1(b)
assert "x" not in locals()
assert "f" not in locals()
assert a == 42
def _anon2(b) # 'b' reference copied in
d = b
assert d[42] == 42
# Nothing to copy out (not an assignment)
_anon2()
assert "d" not in locals()
def _anon3() # Nothing to copy in (no references to other variables)
x = 42
def f(): pass
y = locals("x"), f.__name__
y = y # Assuming no optimisation of special cases
return y # 'y' reference copied out
y = _anon3()
assert "x" not in locals()
assert "f" not in locals()
assert y == (42, "f")
However, as noted in the abstract, an actual implementation of
this idea has never been tried.
Detailed Semantics #1: Early Binding of Variable References
-----------------------------------------------------------
The copy-in-copy-out semantics mean that all variable references from a
``given`` clause will exhibit early binding behaviour, in contrast to the
late binding typically seen with references to closure variables and globals
in ordinary functions. This behaviour will allow the ``given`` clause to
be used as a substitute for the default argument hack when early binding
behaviour is desired::
# Current Python (late binding)
seq = []
for i in range(10):
def f():
return i
seq.append(f)
assert [f() for f in seq] == [9]*10
# Current Python (early binding via default argument hack)
seq = []
for i in range(10):
def f(_i=i):
return i
seq.append(f)
assert [f() for f in seq] == list(range(10))
# Early binding via given clause
seq = []
for i in range(10):
seq.append(f) given:
def f():
return i
assert [f() for f in seq] == list(range(10))
Note that the current intention is for the copy-in/copy-out semantics to
apply only to names defined in the local scope containing the ``given``
clause. Name in outer scopes will be referenced as normal.
This intention is subject to revision based on feedback and practicalities
of implementation.
Detailed Semantics #2: Handling of ``nonlocal`` and ``global``
--------------------------------------------------------------
``nonlocal`` and ``global`` will largely operate as if the anonymous
functions were defined as in the expansion above. However, they will also
override the default early-binding semantics for names from the containing
scope.
This intention is subject to revision based on feedback and practicalities
of implementation.
Detailed Semantics #3: Handling of ``break`` and ``continue``
-------------------------------------------------------------
``break`` and ``continue`` will operate as if the anonymous functions were
defined as in the expansion above. They will be syntax errors if they occur
in the ``given`` clause suite but will work normally if they appear within
a ``for`` or ``while`` loop as part of that suite.
Detailed Semantics #4: Handling of ``return`` and ``yield``
-------------------------------------------------------------
``return`` and ``yield`` are explicitly disallowed in the ``given`` clause
suite and will be syntax errors if they occur. They will work normally if
they appear within a ``def`` statement within that suite.
Alternative Semantics for Name Binding
--------------------------------------
The "early binding" semantics proposed for the ``given`` clause are driven
by the desire to have ``given`` clauses work "normally" in class scopes (that
is, allowing them to see the local variables in the class, even though classes
do not participate in normal lexical scoping).
There is an alternative, which is to simply declare that the ``given`` clause
creates an ordinary nested scope, just like comprehensions and generator
expressions. Thus, the given clause would share the same quirks as those
constructs: they exhibit surprising behaviour at class scope, since they
can't see the local variables in the class definition. While this behaviour
is considered desirable for method definitions (where class variables are
accessed via the class or instance argument passed to the method), it can be
surprising and inconvenient for implicit scopes that are designed to hide
their own name bindings from the containing scope rather than vice-versa.
A third alternative, more analogous to the comprehension case (where the
outermost iterator expression is evaluated in the current scope and hence can
see class locals normally), would be to allow *explicit* early binding in the
``given`` clause, by passing an optional tuple of assignments after the
``given`` keyword::
# Explicit early binding via given clause
seq = []
for i in range(10):
seq.append(f) given i=i:
def f():
return i
assert [f() for f in seq] == list(range(10))
(Note: I actually like the explicit early binding idea significantly more
than I do the implicit early binding - expect a future version of the PEP
to be updated accordingly. I've already used it above when describing how
an existing construct like a list comprehension could be expressed as a
special case of the new syntax)
Examples
========
Defining "one-off" classes which typically only have a single instance::
# Current Python (instantiation after definition)
class public_name():
... # However many lines
public_name = public_name(*params)
# Current Python (custom decorator)
def singleton(*args, **kwds):
def decorator(cls):
return cls(*args, **kwds)
return decorator
@singleton(*params)
class public_name():
... # However many lines
public_name = public_name(*params)
# Becomes:
public_name = MeaningfulClassName(*params) given:
class MeaningfulClassName():
... # Should trawl the stdlib for an example of doing this
Calculating attributes without polluting the local namespace (from os.py)::
# Current Python (manual namespace cleanup)
def _createenviron():
... # 27 line function
environ = _createenviron()
del _createenviron
# Becomes:
environ = _createenviron() given:
def _createenviron():
... # 27 line function
Replacing default argument hack (from functools.lru_cache)::
# Current Python (default argument hack)
def decorating_function(user_function,
tuple=tuple, sorted=sorted, len=len, KeyError=KeyError):
... # 60 line function
return decorating_function
# Becomes:
return decorating_function given:
# Cell variables rather than locals, but should give similar speedup
tuple, sorted, len, KeyError = tuple, sorted, len, KeyError
def decorating_function(user_function):
... # 60 line function
# This example also nicely makes it clear that there is nothing in the
# function after the nested function definition. Due to additional
# nested functions, that isn't entirely clear in the current code.
Possible Additions
==================
* The current proposal allows the addition of a ``given`` clause only
for simple statements. Extending the idea to allow the use of
compound statements would be quite possible (by appending the given
clause as an independent suite at the end), but doing so raises
serious readability concerns (as values defined in the ``given``
clause may be used well before they are defined, exactly the kind
of readability trap that other features like decorators and ``with``
statements are designed to eliminate)
* The "explicit early binding" variant may be applicable to the discussions
on python-ideas on how to eliminate the default argument hack. A ``given``
clause in the header line for functions may be the answer to that question.
Reference Implementation
========================
None as yet. If you want a crash course in Python namespace
semantics and code compilation, feel free to try ;)
TO-DO
=====
* Mention two-suite in-order variants (and explain why they're even more
pointless than the specific idea in the PEP)
* Mention PEP 359 and possible uses for locals() in the ``given`` clause
References
==========
.. [1] Explicitation lines in Python:
http://mail.python.org/pipermail/python-ideas/2010-June/007476.html
.. [2] 'where' statement in Python:
http://mail.python.org/pipermail/python-ideas/2010-July/007584.html
.. [3] Where-statement (Proposal for function expressions):
http://mail.python.org/pipermail/python-ideas/2009-July/005132.html
.. [4] Name conflict with NumPy for 'where' keyword choice:
http://mail.python.org/pipermail/python-ideas/2010-July/007596.html
.. [5] The "Status quo wins a stalemate" design principle:
http://www.boredomandlaziness.org/2011/02/status-quo-wins-stalemate.html
.. [6] Assignments in list/generator expressions:
http://mail.python.org/pipermail/python-ideas/2011-April/009863.html
.. [7] Possible PEP 3150 style guidelines (#1):
http://mail.python.org/pipermail/python-ideas/2011-April/009869.html
.. [8] Discussion of PEP 403 (statement local function definition):
http://mail.python.org/pipermail/python-ideas/2011-October/012276.html
.. [9] Possible PEP 3150 style guidelines (#2):
http://mail.python.org/pipermail/python-ideas/2011-October/012341.html
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: