Update PEP 3150 to reflect April discussion on python-ideas

This commit is contained in:
Nick Coghlan 2011-06-13 01:37:12 +10:00
parent c9c2592056
commit acc5285f29
1 changed files with 313 additions and 163 deletions

View File

@ -8,69 +8,67 @@ Type: Standards Track
Content-Type: text/x-rst
Created: 2010-07-09
Python-Version: 3.3
Post-History: 2010-07-14
Post-History: 2010-07-14, 2011-04-21, 2011-06-13
Resolution: TBD
Abstract
========
A recurring proposal ([1], [2], [3]) on python-ideas is the addition of some form of
statement local namespace.
This PEP proposes the addition of an optional ``given`` clause to several
Python statements that do not currently have an associated code suite. This
clause will create a statement local namespace for additional names that are
accessible in the associated statement, but do not become part of the
containing namespace.
This PEP is intended to serve as a focal point for those ideas, so we
can hopefully avoid retreading the same ground a couple of times a
year. Even if the proposal is never accepted having a PEP to point
people to can be valuable (e.g. having PEP 315 helps greatly in avoiding
endless rehashing of loop-and-a-half arguments).
The primary motivation is to elevate ordinary assignment statements to be
on par with ``class`` and ``def`` statements where the name of the item to
be defined is presented to the reader in advance of the details of how the
value of that item is calculated.
The ideas in this PEP are just a sketch of a way this concept might work.
They avoid some pitfalls that have been encountered in the past, but
have not themselves been subject to the test of implementation.
A secondary motivation is to simplify interim calculations in module and
class level code without polluting the resulting namespaces.
There are additional emergent properties of the proposed solution which may
be of interest to some users. Most notably, it is proposed that this clause
use a new kind of scope that performs early binding of variables, potentially
replacing other techniques that achieve the same effect (such as the "default
argument hack").
The specific proposal in this PEP has been informed by various explorations
of this and related concepts over the years (e.g. [1], [2], [3], [6]). It avoids
some pitfalls that have been encountered in the past, but has not yet itself
been subject to the test of implementation.
PEP Deferral
============
This PEP is currently deferred at least until the language moratorium
(PEP 3003) is officially lifted by Guido. Even after that, it will
require input from at least the four major Python implementations
(CPython, PyPy, Jython, IronPython) on the feasibility of implementing
the proposed semantics to get it moving again. Input from related
projects with a vested interest in Python's syntax (e.g. Cython) will
also be valuable.
Despite the lifting of the language moratorium (PEP 3003) for Python 3.3,
this PEP currently remains in a Deferred state. That means the PEP has to
pass at least *two* hurdles to become part of 3.3.
That said, if a decision on acceptance or rejection had to be made
immediately, rejection would be far more likely. Unlike the previous
major syntax addition to Python (PEP 343's ``with`` statement), this
PEP has no "killer application" of code that is clearly and obviously
improved through the use of the new syntax. The ``with`` statement (in
conjunction with the generator enhancements in PEP 342) allowed
exception handling to be factored out into context managers in a way
that had never before been possible. Code using the new statement was
not only easier to read, but much easier to write correctly in the
first place.
Firstly, I personally have to be sufficiently convinced of the PEP's value and
feasibility to return it to Draft status. While I do see merit in the concept
of statement local namespaces (otherwise I wouldn't have spent so much time
pondering the idea over the years), I also have grave doubts as to the wisdom
of actually adding it to the language (see "Key Concern" below).
In the case of this PEP. however, the "Two Ways to Do It" objection is a
strong one. While the ability to break out subexpresions of a statement
without having to worry about name clashes with the rest of a
function or script and without distracting from the operation that is
the ultimate aim of the statement is potentially nice to have as a
language feature, it doesn't really provide significant expressive power
over and above what is already possible by assigning subexpressions to
ordinary local variables before the statement of interest. In particular,
explaining to new Python programmers when it is best to use a ``given``
clause and when to use normal local variables is likely to be challenging
and an unnecessary distraction.
Secondly, Guido van Rossum (or his delegate) will need to accept the PEP. At
the very least, that will not occur until a fully functional draft
implementation for CPython is available, and the other three major Python
implementations (PyPy, Jython, IronPython) have indicated that they consider
it feasible to implement the proposed semantics once they reach the point of
targetting 3.3 compatibility. Input from related projects with a vested
interest in Python's syntax (e.g. Cython) will also be valuable.
"It might be kinda, sorta, nice to have, sometimes" really isn't a strong
argument for a new syntactic construct (particularly one this complicated).
Proposal
========
This PEP proposes the addition of an optional ``given`` clause to the
syntax for simple statements which may contain an expression. The
syntax for simple statements which may contain an expression, or may
substitute for such an expression for purely syntactic purposes. The
current list of simple statements that would be affected by this
addition is as follows:
@ -82,88 +80,78 @@ addition is as follows:
* yield statement
* raise statement
* assert statement
* pass statement
The ``given`` clause would allow subexpressions to be referenced by
name in the header line, with the actual definitions following in
the indented clause. As a simple example::
c = sqrt(a*a + b*b) given:
a = retrieve_a()
b = retrieve_b()
a, b = 3, 4
The ``pass`` statement is included to provide a consistent way to skip
inclusion of a meaningful expression in the header line. While this is not
an intended use case, it isn't one that can be prevented as multiple
alternatives (such as ``...`` and ``()``) remain available even if ``pass``
itself is disallowed.
Rationale
=========
Some past language features (specifically function decorators
and list comprehensions) were motivated, at least in part, by
the desire to give the important parts of a statement more
prominence when reading code. In the case of function decorators,
information such as whether or not a method is a class or static
method can now be found in the function definition rather than
after the function body. List comprehensions similarly take the
expression being assigned to each member of the list and move it
to the beginning of the expression rather than leaving it buried
inside a ``for`` loop.
Function and class statements in Python have a unique property
relative to ordinary assignment statements: to some degree, they are
*declarative*. They present the reader of the code with some critical
information about a name that is about to be defined, before
proceeding on with the details of the actual definition in the
function or class body.
The rationale for the ``given`` clause is similar. Currently,
breaking out a subexpression requires naming that subexpression
*before* the actual statement of interest. The ``given`` clause
is designed to allow a programmer to highlight for the reader
the statement which is actually of interest (and presumably has
significance for later code) while hiding the most likely irrelevant
"calculation details" inside an indented suite.
The *name* of the object being declared is the first thing stated
after the keyword. Other important information is also given the
honour of preceding the implementation details:
Using the simple example from the proposal section, the current Python
equivalent would be::
- decorators (which can greatly affect the behaviour of the created
object, and were placed ahead of even the keyword and name as a matter
of practicality moreso than aesthetics)
- the docstring (on the first line immediately following the header line)
- parameters, default values and annotations for function definitions
- parent classes, metaclass and optionally other details (depending on
the metaclass) for class definitions
a = retrieve_a()
b = retrieve_b()
c = sqrt(a*a + b*b)
This PEP proposes to make a similar declarative style available for
arbitrary assignment operations, by permitting the inclusion of a
"given" suite following any simple assignment statement::
If later code is only interested in the value of c, then the
details involved in retrieving the values of a and b may be an
unnecessary distraction to the reader (particularly if those
details are more complicated than the simple function calls
shown in the example).
TARGET = [TARGET2 = ... TARGETN =] EXPR given:
SUITE
To use a more illustrative example (courtesy of Alex Light),
which of the following is easier to comprehend?
By convention, code in the body of the suite should be oriented solely
towards correctly defining the assignment operation carried out in the
header line. The header line operation should also be adequately
descriptive (e.g. through appropriate choices of variable names) to
give a reader a reasonable idea of the purpose of the operation
without reading the body of the suite.
Subexpressions up front?::
sea = water()
temp = get_temperature(sea)
depth = get_depth(sea)
purity = get_purity(sea)
saltiness = get_salinity(sea)
size = get_size(sea)
density = get_density(sea)
desired_property = calc_value(temp, depth, purity,
salinity, size, density)
# Further operations using desired_property
Or subexpressions indented?::
desired_property = calc_value(temp, depth, purity,
salinity, size, density) given:
sea = water()
temp = get_temperature(sea)
depth = get_depth(sea)
purity = get_purity(sea)
saltiness = get_salinity(sea)
size = get_size(sea)
density = get_density(sea)
# Further operations using desired_property
However, while they are the initial motivating use case, limiting this
feature solely to simple assignments would be overly restrictive. Once the
feature is defined at all, it would be quite arbitrary to prevent its use
for augmented assignments, return statements, yield expressions and
arbitrary expressions that may modify the application state.
The ``given`` clause may also function as a more readable
alternative to some uses of lambda expressions and similar
constructs when passing one-off functions to operations
like ``sorted``.
like ``sorted()``.
One way to think of the proposed clause is as a middle
ground between normal in-line code and separation of an
In module and class level code, the ``given`` clause will serve as a
clear and reliable replacement for usage of the ``del`` statement to keep
interim working variables from polluting the resulting namespace.
One potentially useful way to think of the proposed clause is as a middle
ground between conventional in-line code and separation of an
operation out into a dedicated function.
Keyword Choice
==============
@ -185,6 +173,7 @@ a high potential for confusion as the ``with`` clause and ``with``
statement would look similar but do completely different things.
That way lies C++ and Perl :)
Syntax Change
=============
@ -193,6 +182,7 @@ Current::
expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
('=' (yield_expr|testlist_star_expr))*)
del_stmt: 'del' exprlist
pass_stmt: 'pass'
return_stmt: 'return' [testlist]
yield_stmt: yield_expr
raise_stmt: 'raise' [test ['from' test]]
@ -204,6 +194,7 @@ New::
expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
('=' (yield_expr|testlist_star_expr))*) [given_clause]
del_stmt: 'del' exprlist [given_clause]
pass_stmt: 'pass' [given_clause]
return_stmt: 'return' [testlist] [given_clause]
yield_stmt: yield_expr [given_clause]
raise_stmt: 'raise' [test ['from' test]] [given_clause]
@ -218,12 +209,12 @@ rather than as a new kind of compound statement in order to avoid creating
an ambiguity in the grammar. It is applied only to the specific elements
listed so that nonsense like the following is disallowed::
pass given:
break given:
a = b = 1
However, even this is inadequate, as it creates problems for the definition
of simple_stmt (which allows chaining of multiple single line statements
with ";" rather than "\\n").
However, the precise Grammar change described above is inadequate, as it
creates problems for the definition of simple_stmt (which allows chaining of
multiple single line statements with ";" rather than "\\n").
So the above syntax change should instead be taken as a statement of intent.
Any actual proposal would need to resolve the simple_stmt parsing problem
@ -234,7 +225,7 @@ subexpressions and then allowing a single one of those statements with
a ``given`` clause at the simple_stmt level. Something along the lines of::
stmt: simple_stmt | given_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
simple_stmt: small_stmt (';' (small_stmt | subexpr_stmt))* [';'] NEWLINE
small_stmt: (pass_stmt | flow_stmt | import_stmt |
global_stmt | nonlocal_stmt)
flow_stmt: break_stmt | continue_stmt
@ -253,54 +244,13 @@ For reference, here are the current definitions at that level::
flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
Common Objections
=================
* Two Ways To Do It: a lot of code may now be written with values
defined either before the expression where they are used or
afterwards in a ``given`` clause, creating two ways to do it,
without an obvious way of choosing between them.
* Out of Order Execution: the ``given`` clause makes execution
jump around a little strangely, as the body of the ``given``
clause is executed before the simple statement in the clause
header. The closest any other part of Python comes to this
is the out of order evaluation in list comprehensions,
generator expressions and conditional expressions.
These objections should not be dismissed lightly - the proposal
in this PEP needs to be subjected to the test of application to
a large code base (such as the standard library) in a search
for examples where the readability of real world code is genuinely
enhanced.
New PEP 8 guidelines would also need to be developed to provide
appropriate direction on when to use the ``given`` clause over
ordinary variable assignments.
Possible Additions
==================
* The current proposal allows the addition of a ``given`` clause only
for simple statements. Extending the idea to allow the use of
compound statements would be quite possible, but doing so raises
serious readability concerns (as values defined in the ``given``
clause may be used well before they are defined, exactly the kind
of readability trap that other features like decorators and ``with``
statements are designed to eliminate)
* Currently only the outermost clause of comprehensions and generator
expressions can reference the surrounding namespace when executed
at class level. If this proposal is implemented successfully, the
associated namespace semantics could allow that restriction to be
lifted. There would be backwards compatibility implications in doing
so as existing code may be relying on the behaviour of ignoring
class level variables, but the idea is worth considering.
Possible Implementation Strategy
================================
Torture Test
============
------------
An implementation of this PEP must support execution of the following
An implementation of this PEP should support execution of the following
code at module, class and function scope::
b = {}
@ -341,9 +291,8 @@ you might rightly say, but legal::
>>> b
{42: 42}
Possible Implementation Strategy
================================
Details of Proposed Semantics
-----------------------------
AKA How Class Scopes Screw You When Attempting To Implement This
@ -398,6 +347,175 @@ would then translate to something like the following::
However, as noted in the abstract, an actual implementation of
this idea has never been tried.
Detailed Semantics #1: Early Binding of Variable References
-----------------------------------------------------------
The copy-in-copy-out semantics mean that all variable references from a
``given`` clause will exhibit early binding behaviour, in contrast to the
late binding typically seen with references to closure variables and globals
in ordinary functions. This behaviour will allow the ``given`` clause to
be used as a substitute for the default argument hack when early binding
behaviour is desired::
# Current Python (late binding)
seq = []
for i in range(10):
def f():
return i
seq.append(f)
assert seq == [9]*10
# Current Python (early binding via default argument hack)
seq = []
for i in range(10):
def f(_i=i):
return i
seq.append(f)
assert seq == list(range(10))
# Early binding via given clause
seq = []
for i in range(10):
seq.append(f) given:
def f():
return i
assert seq == list(range(10))
Note that the current intention is for the copy-in/copy-out semantics to
apply only to names defined in the local scope containing the ``given``
clause. Name in outer scopes will be referenced as normal.
This intention is subject to revision based on feedback and practicalities
of implementation.
Detailed Semantics #2: Handling of ``nonlocal`` and ``global``
--------------------------------------------------------------
``nonlocal`` and ``global`` will largely operate as if the anonymous
functions were defined as in the expansion above. However, they will also
override the default early-binding semantics from names from the containing
scope.
This intention is subject to revision based on feedback and practicalities
of implementation.
Detailed Semantics #3: Handling of ``break`` and ``continue``
-------------------------------------------------------------
``break`` and ``continue`` will operate as if the anonymous functions were
defined as in the expansion above. They will be syntax errors if they occur
in the ``given`` clause suite but will work normally if they appear within
a ``for`` or ``while`` loop as part of that suite.
Detailed Semantics #4: Handling of ``return`` and ``yield``
-------------------------------------------------------------
``return`` and ``yield`` are explicitly disallowed in the ``given`` clause
suite and will be syntax errors if they occur. They will work normally if
they appear within a ``def`` statement within that suite.
Examples
========
Defining "one-off" classes which typically only have a single instance::
# Current Python (instantiation after definition)
class public_name():
... # However many lines
public_name = public_name(*params)
# Becomes:
public_name = MeaningfulClassName(*params) given:
class MeaningfulClassName():
... # Should trawl the stdlib for an example of doing this
Calculating attributes without polluting the local namespace (from os.py)::
# Current Python (manual namespace cleanup)
def _createenviron():
... # 27 line function
environ = _createenviron()
del _createenviron
# Becomes:
environ = _createenviron() given:
def _createenviron():
... # 27 line function
Replacing default argument hack (from functools.lru_cache)::
# Current Python (default argument hack)
def decorating_function(user_function,
tuple=tuple, sorted=sorted, len=len, KeyError=KeyError):
... # 60 line function
return decorating_function
# Becomes:
return decorating_function given:
# Cell variables rather than locals, but should give similar speedup
tuple, sorted, len, KeyError = tuple, sorted, len, KeyError
def decorating_function(user_function):
... # 60 line function
# This example also nicely makes it clear that there is nothing in the
# function after the nested function definition. Due to additional
# nested functions, that isn't entirely clear in the current code.
Anticipated Objections
======================
* Two Ways To Do It: a lot of code may now be written with values
defined either before the expression where they are used or
afterwards in a ``given`` clause, creating two ways to do it,
without an obvious way of choosing between them.
* Out of Order Execution: the ``given`` clause makes execution
jump around a little strangely, as the body of the ``given``
clause is executed before the simple statement in the clause
header. The closest any other part of Python comes to this
is the out of order evaluation in list comprehensions,
generator expressions and conditional expressions.
* Harmful to Introspection: poking around in module and class internals
is an invaluable tool for white-box testing and interactive debugging.
The ``given`` clause will be quite effective at preventing access to
temporary state used during calculations (although no more so than
current usage of ``del`` statements in that regard)
These objections should not be dismissed lightly - the proposal
in this PEP needs to be subjected to the test of application to
a large code base (such as the standard library) in a search
for examples where the readability of real world code is genuinely
enhanced.
New PEP 8 guidelines would also need to be developed to provide
appropriate direction on when to use the ``given`` clause over
ordinary variable assignments. Some thoughts on possible guidelines are
provided at [7]
Possible Additions
==================
* The current proposal allows the addition of a ``given`` clause only
for simple statements. Extending the idea to allow the use of
compound statements would be quite possible, but doing so raises
serious readability concerns (as values defined in the ``given``
clause may be used well before they are defined, exactly the kind
of readability trap that other features like decorators and ``with``
statements are designed to eliminate)
* Currently only the outermost clause of comprehensions and generator
expressions can reference the surrounding namespace when executed
at class level. If this proposal is implemented successfully, the
associated namespace semantics could allow that restriction to be
lifted. There would be backwards compatibility implications in doing
so as existing code may be relying on the behaviour of ignoring
class level variables, but the idea is worth considering.
Reference Implementation
========================
@ -406,21 +524,47 @@ None as yet. If you want a crash course in Python namespace
semantics and code compilation, feel free to try ;)
Key Concern
===========
If a decision on the acceptance or rejection of this PEP had to be made
immediately, rejection would be far more likely. Unlike the previous
major syntax addition to Python (PEP 343's ``with`` statement), this
PEP as yet has no "killer application" of common code that is clearly and
obviously improved through the use of the new syntax. The ``with`` statement
(in conjunction with the generator enhancements in PEP 342) allowed
exception handling to be factored out into context managers in a way
that had never before been possible. Code using the new statement was
not only easier to read, but much easier to write correctly in the
first place.
In the case of this PEP. however, the "Two Ways to Do It" objection is a
strong one. While the ability to break out subexpresions of a statement
without having to worry about name clashes with the rest of a
function or script and without distracting from the operation that is
the ultimate aim of the statement is potentially nice to have as a
language feature, it doesn't really provide significant expressive power
over and above what is already possible by assigning subexpressions to
ordinary local variables before the statement of interest. In particular,
explaining to new Python programmers when it is best to use a ``given``
clause and when to use normal local variables is likely to be challenging
and an unnecessary distraction.
"It might be kinda, sorta, nice to have, sometimes" really isn't a strong
argument for a new syntactic construct (particularly one this complicated).
"Status quo wins a stalemate" [5] is a very useful design principle, and I'm
not yet convinced that this PEP clears that hurdle.
The case for it has definitely strengthened over time though, which is why
this PEP remains Deferred rather than Rejected.
TO-DO
=====
* Mention two-suite in-order variants (and explain why they're even more
pointless than the specific idea in the PEP)
* Mention PEP 359 and possible uses for locals() in the ``given`` clause
* Define the expected semantics of ``break``, ``continue``, ``return``
and ``yield`` in a ``given`` clause (i.e. syntax errors at the clause
level, but allowed inside the appropriate compound statements)
* Describe the expected semantics of ``nonlocal`` and ``global`` in the
``given`` clause.
* Describe the name lookup semantics for function definitions in a
``given`` clause at function, class and module scope. In particular,
note the early binding effect on loop variables or other variables
that are rebound after the ``given`` clause is complete.
References
@ -434,6 +578,12 @@ References
.. [4] http://mail.python.org/pipermail/python-ideas/2010-July/007596.html
.. [5] http://www.boredomandlaziness.org/2011/02/status-quo-wins-stalemate.html
.. [6] http://mail.python.org/pipermail/python-ideas/2011-April/009863.html
.. [7] http://mail.python.org/pipermail/python-ideas/2011-April/009869.html
Copyright
=========