python-peps/peps/pep-3150.rst

775 lines
30 KiB
ReStructuredText

PEP: 3150
Title: Statement local namespaces (aka "given" clause)
Version: $Revision$
Last-Modified: $Date$
Author: Alyssa Coghlan <ncoghlan@gmail.com>
Status: Deferred
Type: Standards Track
Content-Type: text/x-rst
Created: 09-Jul-2010
Python-Version: 3.4
Post-History: 14-Jul-2010, 21-Apr-2011, 13-Jun-2011
Abstract
========
This PEP proposes the addition of an optional ``given`` clause to several
Python statements that do not currently have an associated code suite. This
clause will create a statement local namespace for additional names that are
accessible in the associated statement, but do not become part of the
containing namespace.
Adoption of a new symbol, ``?``, is proposed to denote a forward reference
to the namespace created by running the associated code suite. It will be
a reference to a ``types.SimpleNamespace`` object.
The primary motivation is to enable a more declarative style of programming,
where the operation to be performed is presented to the reader first, and the
details of the necessary subcalculations are presented in the following
indented suite. As a key example, this would elevate ordinary assignment
statements to be on par with ``class`` and ``def`` statements where the name
of the item to be defined is presented to the reader in advance of the
details of how the value of that item is calculated. It also allows named
functions to be used in a "multi-line lambda" fashion, where the name is used
solely as a placeholder in the current expression and then defined in the
following suite.
A secondary motivation is to simplify interim calculations in module and
class level code without polluting the resulting namespaces.
The intent is that the relationship between a given clause and a separate
function definition that performs the specified operation will be similar to
the existing relationship between an explicit while loop and a generator that
produces the same sequence of operations as that while loop.
The specific proposal in this PEP has been informed by various explorations
of this and related concepts over the years (e.g. [1]_, [2]_, [3]_, [6]_,
[8]_), and is inspired to some degree by the ``where`` and ``let`` clauses in
Haskell. It avoids some problems that have been identified in past proposals,
but has not yet itself been subject to the test of implementation.
Proposal
========
This PEP proposes the addition of an optional ``given`` clause to the
syntax for simple statements which may contain an expression, or may
substitute for such a statement for purely syntactic purposes. The
current list of simple statements that would be affected by this
addition is as follows:
* expression statement
* assignment statement
* augmented assignment statement
* del statement
* return statement
* yield statement
* raise statement
* assert statement
* pass statement
The ``given`` clause would allow subexpressions to be referenced by
name in the header line, with the actual definitions following in
the indented clause. As a simple example::
sorted_data = sorted(data, key=?.sort_key) given:
def sort_key(item):
return item.attr1, item.attr2
The new symbol ``?`` is used to refer to the given namespace. It would be a
``types.SimpleNamespace`` instance, so ``?.sort_key`` functions as
a forward reference to a name defined in the ``given`` clause.
A docstring would be permitted in the given clause, and would be attached
to the result namespace as its ``__doc__`` attribute.
The ``pass`` statement is included to provide a consistent way to skip
inclusion of a meaningful expression in the header line. While this is not
an intended use case, it isn't one that can be prevented as multiple
alternatives (such as ``...`` and ``()``) remain available even if ``pass``
itself is disallowed.
The body of the given clause will execute in a new scope, using normal
function closure semantics. To support early binding of loop variables
and global references, as well as to allow access to other names defined at
class scope, the ``given`` clause will also allow explicit
binding operations in the header line::
# Explicit early binding via given clause
seq = []
for i in range(10):
seq.append(?.f) given i=i in:
def f():
return i
assert [f() for f in seq] == list(range(10))
Semantics
---------
The following statement::
op(?.f, ?.g) given bound_a=a, bound_b=b in:
def f():
return bound_a + bound_b
def g():
return bound_a - bound_b
Would be roughly equivalent to the following code (``__var`` denotes a
hidden compiler variable or simply an entry on the interpreter stack)::
__arg1 = a
__arg2 = b
def __scope(bound_a, bound_b):
def f():
return bound_a + bound_b
def g():
return bound_a - bound_b
return types.SimpleNamespace(**locals())
__ref = __scope(__arg1, __arg2)
__ref.__doc__ = __scope.__doc__
op(__ref.f, __ref.g)
A ``given`` clause is essentially a nested function which is created and
then immediately executed. Unless explicitly passed in, names are looked
up using normal scoping rules, and thus names defined at class scope will
not be visible. Names declared as forward references are returned and
used in the header statement, without being bound locally in the
surrounding namespace.
Syntax Change
-------------
Current::
expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
('=' (yield_expr|testlist_star_expr))*)
del_stmt: 'del' exprlist
pass_stmt: 'pass'
return_stmt: 'return' [testlist]
yield_stmt: yield_expr
raise_stmt: 'raise' [test ['from' test]]
assert_stmt: 'assert' test [',' test]
New::
expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
('=' (yield_expr|testlist_star_expr))*) [given_clause]
del_stmt: 'del' exprlist [given_clause]
pass_stmt: 'pass' [given_clause]
return_stmt: 'return' [testlist] [given_clause]
yield_stmt: yield_expr [given_clause]
raise_stmt: 'raise' [test ['from' test]] [given_clause]
assert_stmt: 'assert' test [',' test] [given_clause]
given_clause: "given" [(NAME '=' test)+ "in"]":" suite
(Note that ``expr_stmt`` in the grammar is a slight misnomer, as it covers
assignment and augmented assignment in addition to simple expression
statements)
.. note::
These proposed grammar changes don't yet cover the forward reference
expression syntax for accessing names defined in the statement local
namespace.
The new clause is added as an optional element of the existing statements
rather than as a new kind of compound statement in order to avoid creating
an ambiguity in the grammar. It is applied only to the specific elements
listed so that nonsense like the following is disallowed::
break given:
a = b = 1
import sys given:
a = b = 1
However, the precise Grammar change described above is inadequate, as it
creates problems for the definition of simple_stmt (which allows chaining of
multiple single line statements with ";" rather than "\\n").
So the above syntax change should instead be taken as a statement of intent.
Any actual proposal would need to resolve the simple_stmt parsing problem
before it could be seriously considered. This would likely require a
non-trivial restructuring of the grammar, breaking up small_stmt and
flow_stmt to separate the statements that potentially contain arbitrary
subexpressions and then allowing a single one of those statements with
a ``given`` clause at the simple_stmt level. Something along the lines of::
stmt: simple_stmt | given_stmt | compound_stmt
simple_stmt: small_stmt (';' (small_stmt | subexpr_stmt))* [';'] NEWLINE
small_stmt: (pass_stmt | flow_stmt | import_stmt |
global_stmt | nonlocal_stmt)
flow_stmt: break_stmt | continue_stmt
given_stmt: subexpr_stmt (given_clause |
(';' (small_stmt | subexpr_stmt))* [';']) NEWLINE
subexpr_stmt: expr_stmt | del_stmt | flow_subexpr_stmt | assert_stmt
flow_subexpr_stmt: return_stmt | raise_stmt | yield_stmt
given_clause: "given" (NAME '=' test)* ":" suite
For reference, here are the current definitions at that level::
stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | nonlocal_stmt | assert_stmt)
flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
In addition to the above changes, the definition of ``atom`` would be changed
to also allow ``?``. The restriction of this usage to statements with
an associated ``given`` clause would be handled by a later stage of the
compilation process (likely AST construction, which already enforces
other restrictions where the grammar is overly permissive in order to
simplify the initial parsing step).
New PEP 8 Guidelines
--------------------
As discussed on python-ideas ([7]_, [9]_) new :pep:`8` guidelines would also
need to be developed to provide appropriate direction on when to use the
``given`` clause over ordinary variable assignments.
Based on the similar guidelines already present for ``try`` statements, this
PEP proposes the following additions for ``given`` statements to the
"Programming Conventions" section of :pep:`8`:
- for code that could reasonably be factored out into a separate function,
but is not currently reused anywhere, consider using a ``given`` clause.
This clearly indicates which variables are being used only to define
subcomponents of another statement rather than to hold algorithm or
application state. This is an especially useful technique when
passing multi-line functions to operations which take callable
arguments.
- keep ``given`` clauses concise. If they become unwieldy, either break
them up into multiple steps or else move the details into a separate
function.
Rationale
=========
Function and class statements in Python have a unique property
relative to ordinary assignment statements: to some degree, they are
*declarative*. They present the reader of the code with some critical
information about a name that is about to be defined, before
proceeding on with the details of the actual definition in the
function or class body.
The *name* of the object being declared is the first thing stated
after the keyword. Other important information is also given the
honour of preceding the implementation details:
- decorators (which can greatly affect the behaviour of the created
object, and were placed ahead of even the keyword and name as a matter
of practicality more so than aesthetics)
- the docstring (on the first line immediately following the header line)
- parameters, default values and annotations for function definitions
- parent classes, metaclass and optionally other details (depending on
the metaclass) for class definitions
This PEP proposes to make a similar declarative style available for
arbitrary assignment operations, by permitting the inclusion of a
"given" suite following any simple assignment statement::
TARGET = [TARGET2 = ... TARGETN =] EXPR given:
SUITE
By convention, code in the body of the suite should be oriented solely
towards correctly defining the assignment operation carried out in the
header line. The header line operation should also be adequately
descriptive (e.g. through appropriate choices of variable names) to
give a reader a reasonable idea of the purpose of the operation
without reading the body of the suite.
However, while they are the initial motivating use case, limiting this
feature solely to simple assignments would be overly restrictive. Once the
feature is defined at all, it would be quite arbitrary to prevent its use
for augmented assignments, return statements, yield expressions,
comprehensions and arbitrary expressions that may modify the
application state.
The ``given`` clause may also function as a more readable
alternative to some uses of lambda expressions and similar
constructs when passing one-off functions to operations
like ``sorted()`` or in callback based event-driven programming.
In module and class level code, the ``given`` clause will serve as a
clear and reliable replacement for usage of the ``del`` statement to keep
interim working variables from polluting the resulting namespace.
One potentially useful way to think of the proposed clause is as a middle
ground between conventional in-line code and separation of an
operation out into a dedicated function, just as an inline while loop may
eventually be factored out into a dedicated generator.
Design Discussion
=================
Keyword Choice
--------------
This proposal initially used ``where`` based on the name of a similar
construct in Haskell. However, it has been pointed out that there
are existing Python libraries (such as Numpy [4]_) that already use
``where`` in the SQL query condition sense, making that keyword choice
potentially confusing.
While ``given`` may also be used as a variable name (and hence would be
deprecated using the usual ``__future__`` dance for introducing
new keywords), it is associated much more strongly with the desired
"here are some extra variables this expression may use" semantics
for the new clause.
Reusing the ``with`` keyword has also been proposed. This has the
advantage of avoiding the addition of a new keyword, but also has
a high potential for confusion as the ``with`` clause and ``with``
statement would look similar but do completely different things.
That way lies C++ and Perl :)
Relation to PEP 403
-------------------
:pep:`403` (General Purpose Decorator Clause) attempts to achieve the main
goals of this PEP using a less radical language change inspired by the
existing decorator syntax.
Despite having the same author, the two PEPs are in direct competition with
each other. :pep:`403` represents a minimalist approach that attempts to achieve
useful functionality with a minimum of change from the status quo. This PEP
instead aims for a more flexible standalone statement design, which requires
a larger degree of change to the language.
Note that where :pep:`403` is better suited to explaining the behaviour of
generator expressions correctly, this PEP is better able to explain the
behaviour of decorator clauses in general. Both PEPs support adequate
explanations for the semantics of container comprehensions.
Explaining Container Comprehensions and Generator Expressions
-------------------------------------------------------------
One interesting feature of the proposed construct is that it can be used as
a primitive to explain the scoping and execution order semantics of
container comprehensions::
seq2 = [x for x in y if q(x) for y in seq if p(y)]
# would be equivalent to
seq2 = ?.result given seq=seq:
result = []
for y in seq:
if p(y):
for x in y:
if q(x):
result.append(x)
The important point in this expansion is that it explains why comprehensions
appear to misbehave at class scope: only the outermost iterator is evaluated
at class scope, while all predicates, nested iterators and value expressions
are evaluated inside a nested scope.
Not that, unlike :pep:`403`, the current version of this PEP *cannot*
provide a precisely equivalent expansion for a generator expression. The
closest it can get is to define an additional level of scoping::
seq2 = ?.g(seq) given:
def g(seq):
for y in seq:
if p(y):
for x in y:
if q(x):
yield x
This limitation could be remedied by permitting the given clause to be
a generator function, in which case ? would refer to a generator-iterator
object rather than a simple namespace::
seq2 = ? given seq=seq in:
for y in seq:
if p(y):
for x in y:
if q(x):
yield x
However, this would make the meaning of "?" quite ambiguous, even more so
than is already the case for the meaning of ``def`` statements (which will
usually have a docstring indicating whether or not a function definition is
actually a generator)
Explaining Decorator Clause Evaluation and Application
------------------------------------------------------
The standard explanation of decorator clause evaluation and application
has to deal with the idea of hidden compiler variables in order to show
steps in their order of execution. The given statement allows a decorated
function definition like::
@classmethod
def classname(cls):
return cls.__name__
To instead be explained as roughly equivalent to::
classname = .d1(classname) given:
d1 = classmethod
def classname(cls):
return cls.__name__
Anticipated Objections
----------------------
Two Ways To Do It
~~~~~~~~~~~~~~~~~
A lot of code may now be written with values defined either before the
expression where they are used or afterwards in a ``given`` clause, creating
two ways to do it, perhaps without an obvious way of choosing between them.
On reflection, I feel this is a misapplication of the "one obvious way"
aphorism. Python already offers *lots* of ways to write code. We can use
a for loop or a while loop, a functional style or an imperative style or an
object oriented style. The language, in general, is designed to let people
write code that matches the way they think. Since different people think
differently, the way they write their code will change accordingly.
Such stylistic questions in a code base are rightly left to the development
group responsible for that code. When does an expression get so complicated
that the subexpressions should be taken out and assigned to variables, even
though those variables are only going to be used once? When should an inline
while loop be replaced with a generator that implements the same logic?
Opinions differ, and that's OK.
However, explicit :pep:`8` guidance will be needed for CPython and the standard
library, and that is discussed in the proposal above.
Out of Order Execution
~~~~~~~~~~~~~~~~~~~~~~
The ``given`` clause makes execution jump around a little strangely, as the
body of the ``given`` clause is executed before the simple statement in the
clause header. The closest any other part of Python comes to this is the out
of order evaluation in list comprehensions, generator expressions and
conditional expressions and the delayed application of decorator functions to
the function they decorate (the decorator expressions themselves are executed
in the order they are written).
While this is true, the syntax is intended for cases where people are
themselves *thinking* about a problem out of sequence (at least as far as
the language is concerned). As an example of this, consider the following
thought in the mind of a Python user:
I want to sort the items in this sequence according to the values of
attr1 and attr2 on each item.
If they're comfortable with Python's ``lambda`` expressions, then they might
choose to write it like this::
sorted_list = sorted(original, key=(lambda v: v.attr1, v.attr2))
That gets the job done, but it hardly reaches the standard of ``executable
pseudocode`` that fits Python's reputation.
If they don't like ``lambda`` specifically, the ``operator`` module offers an
alternative that still allows the key function to be defined inline::
sorted_list = sorted(original,
key=operator.attrgetter(v. 'attr1', 'attr2'))
Again, it gets the job done, but even the most generous of readers would
not consider that to be "executable pseudocode".
If they think both of the above options are ugly and confusing, or they need
logic in their key function that can't be expressed as an expression (such
as catching an exception), then Python currently forces them to reverse the
order of their original thought and define the sorting criteria first::
def sort_key(item):
return item.attr1, item.attr2
sorted_list = sorted(original, key=sort_key)
"Just define a function" has been the rote response to requests for multi-line
lambda support for years. As with the above options, it gets the job done,
but it really does represent a break between what the user is thinking and
what the language allows them to express.
I believe the proposal in this PEP would finally let Python get close to the
"executable pseudocode" bar for the kind of thought expressed above::
sorted_list = sorted(original, key=?.key) given:
def key(item):
return item.attr1, item.attr2
Everything is in the same order as it was in the user's original thought, and
they don't even need to come up with a name for the sorting criteria: it is
possible to reuse the keyword argument name directly.
A possible enhancement to those proposal would be to provide a convenient
shorthand syntax to say "use the given clause contents as keyword
arguments". Even without dedicated syntax, that can be written simply as
``**vars(?)``.
Harmful to Introspection
~~~~~~~~~~~~~~~~~~~~~~~~
Poking around in module and class internals is an invaluable tool for
white-box testing and interactive debugging. The ``given`` clause will be
quite effective at preventing access to temporary state used during
calculations (although no more so than current usage of ``del`` statements
in that regard).
While this is a valid concern, design for testability is an issue that
cuts across many aspects of programming. If a component needs to be tested
independently, then a ``given`` statement should be refactored in to separate
statements so that information is exposed to the test suite. This isn't
significantly different from refactoring an operation hidden inside a
function or generator out into its own function purely to allow it to be
tested in isolation.
Lack of Real World Impact Assessment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The examples in the current PEP are almost all relatively small "toy"
examples. The proposal in this PEP needs to be subjected to the test of
application to a large code base (such as the standard library or a large
Twisted application) in a search for examples where the readability of real
world code is genuinely enhanced.
This is more of a deficiency in the PEP rather than the idea, though. If
it wasn't a real world problem, we wouldn't get so many complaints about
the lack of multi-line lambda support and Ruby's block construct
probably wouldn't be quite so popular.
Open Questions
==============
Syntax for Forward References
-----------------------------
The ``?`` symbol is proposed for forward references to the given namespace
as it is short, currently unused and suggests "there's something missing
here that will be filled in later".
The proposal in the PEP doesn't neatly parallel any existing Python feature,
so reusing an already used symbol has been deliberately avoided.
Handling of ``nonlocal`` and ``global``
---------------------------------------
``nonlocal`` and ``global`` are explicitly disallowed in the ``given`` clause
suite and will be syntax errors if they occur. They will work normally if
they appear within a ``def`` statement within that suite.
Alternatively, they could be defined as operating as if the anonymous
functions were defined as in the expansion above.
Handling of ``break`` and ``continue``
--------------------------------------
``break`` and ``continue`` will operate as if the anonymous functions were
defined as in the expansion above. They will be syntax errors if they occur
in the ``given`` clause suite but will work normally if they appear within
a ``for`` or ``while`` loop as part of that suite.
Handling of ``return`` and ``yield``
------------------------------------
``return`` and ``yield`` are explicitly disallowed in the ``given`` clause
suite and will be syntax errors if they occur. They will work normally if
they appear within a ``def`` statement within that suite.
Examples
========
Defining callbacks for event driven programming::
# Current Python (definition before use)
def cb(sock):
# Do something with socket
def eb(exc):
logging.exception(
"Failed connecting to %s:%s", host, port)
loop.create_connection((host, port), cb, eb) given:
# Becomes:
loop.create_connection((host, port), ?.cb, ?.eb) given:
def cb(sock):
# Do something with socket
def eb(exc):
logging.exception(
"Failed connecting to %s:%s", host, port)
Defining "one-off" classes which typically only have a single instance::
# Current Python (instantiation after definition)
class public_name():
... # However many lines
public_name = public_name(*params)
# Current Python (custom decorator)
def singleton(*args, **kwds):
def decorator(cls):
return cls(*args, **kwds)
return decorator
@singleton(*params)
class public_name():
... # However many lines
# Becomes:
public_name = ?.MeaningfulClassName(*params) given:
class MeaningfulClassName():
... # Should trawl the stdlib for an example of doing this
Calculating attributes without polluting the local namespace (from os.py)::
# Current Python (manual namespace cleanup)
def _createenviron():
... # 27 line function
environ = _createenviron()
del _createenviron
# Becomes:
environ = ?._createenviron() given:
def _createenviron():
... # 27 line function
Replacing default argument hack (from functools.lru_cache)::
# Current Python (default argument hack)
def decorating_function(user_function,
tuple=tuple, sorted=sorted, len=len, KeyError=KeyError):
... # 60 line function
return decorating_function
# Becomes:
return ?.decorating_function given:
# Cell variables rather than locals, but should give similar speedup
tuple, sorted, len, KeyError = tuple, sorted, len, KeyError
def decorating_function(user_function):
... # 60 line function
# This example also nicely makes it clear that there is nothing in the
# function after the nested function definition. Due to additional
# nested functions, that isn't entirely clear in the current code.
Possible Additions
==================
* The current proposal allows the addition of a ``given`` clause only
for simple statements. Extending the idea to allow the use of
compound statements would be quite possible (by appending the given
clause as an independent suite at the end), but doing so raises
serious readability concerns (as values defined in the ``given``
clause may be used well before they are defined, exactly the kind
of readability trap that other features like decorators and ``with``
statements are designed to eliminate)
* The "explicit early binding" variant may be applicable to the discussions
on python-ideas on how to eliminate the default argument hack. A ``given``
clause in the header line for functions (after the return type annotation)
may be the answer to that question.
Rejected Alternatives
=====================
* An earlier version of this PEP allowed implicit forward references to the
names in the trailing suite, and also used implicit early binding
semantics. Both of these ideas substantially complicated the proposal
without providing a sufficient increase in expressive power. The current
proposing with explicit forward references and early binding brings the
new construct into line with existing scoping semantics, greatly
improving the chances the idea can actually be implemented.
* In addition to the proposals made here, there have also been suggestions
of two suite "in-order" variants which provide the limited scoping of
names without supporting out-of-order execution. I believe these
suggestions largely miss the point of what people are complaining about
when they ask for multi-line lambda support - it isn't that coming up
with a name for the subexpression is especially difficult, it's that
naming the function before the statement that uses it means the code
no longer matches the way the developer thinks about the problem at hand.
* I've made some unpublished attempts to allow direct references to the
closure implicitly created by the ``given`` clause, while still retaining
the general structure of the syntax as defined in this PEP (For example,
allowing a subexpression like ``?given`` or ``:given`` to be used in
expressions to indicate a direct reference to the implied closure, thus
preventing it from being called automatically to create the local namespace).
All such attempts have appeared unattractive and confusing compared to
the simpler decorator-inspired proposal in :pep:`403`.
Reference Implementation
========================
None as yet. If you want a crash course in Python namespace
semantics and code compilation, feel free to try ;)
TO-DO
=====
* Mention :pep:`359` and possible uses for locals() in the ``given`` clause
* Figure out if this can be used internally to make the implementation of
zero-argument super() calls less awful
References
==========
.. [1] `Explicitation lines in Python
<https://mail.python.org/pipermail/python-ideas/2010-June/007476.html>`__
.. [2] `'where' statement in Python
<https://mail.python.org/pipermail/python-ideas/2010-July/007584.html>`__
.. [3] `Where-statement (Proposal for function expressions)
<https://mail.python.org/pipermail/python-ideas/2009-July/005132.html>`__
.. [4] `Name conflict with NumPy for 'where' keyword choice
<https://mail.python.org/pipermail/python-ideas/2010-July/007596.html>`__
.. [6] `Assignments in list/generator expressions
<https://mail.python.org/pipermail/python-ideas/2011-April/009863.html>`__
.. [7] `Possible PEP 3150 style guidelines (#1)
<https://mail.python.org/pipermail/python-ideas/2011-April/009869.html>`__
.. [8] `Discussion of PEP 403 (statement local function definition)
<https://mail.python.org/pipermail/python-ideas/2011-October/012276.html>`__
.. [9] `Possible PEP 3150 style guidelines (#2)
<https://mail.python.org/pipermail/python-ideas/2011-October/012341.html>`__
* `The "Status quo wins a stalemate" design principle
<https://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.html>`__
* `Multi-line lambdas (again!)
<https://mail.python.org/pipermail/python-ideas/2013-August/022526.html>`__
Copyright
=========
This document has been placed in the public domain.