Updated the PEP for GvR.

* Out of order evaluation is out of favor.
* So are ideas that do not provide for short-circuiting.
* (if <condition>: <expression1> else: <condition2>) is in vogue.
* <condition> ?? <expression1> || <expression2> is a new contender.
* cond(<condition>, <expression1>, <condition2>) is viable if implemented
      as a keyword and has short-circuit behavior.
* Added a summary of a few ideas from the last couple hundred posts
      from comp.lang.python.
This commit is contained in:
Raymond Hettinger 2003-02-11 05:43:56 +00:00
parent 1264fe2202
commit 294fd7ade7
1 changed files with 134 additions and 70 deletions

View File

@ -2,7 +2,7 @@ PEP: 308
Title: If-then-else expression
Version: $Revision$
Last-Modified: $Date$
Author: Guido van Rossum
Author: Guido van Rossum and Raymond D. Hettinger
Status: Draft
Type: Standards Track
Content-Type: text/plain
@ -27,7 +27,8 @@ Proposal
The proposed syntax is as follows:
<expression1> if <condition> else <expression2>
(if <condition>: <expression1> else: <expression2>)
This is evaluated like this:
@ -40,25 +41,10 @@ Proposal
result of the whole thing.
Note that at most one of <expression1> and <expression2> is
evaluated. This is called a "shortcut expression"; it is similar
evaluated. This is called a "short-circuit expression"; it is similar
to the way the second operand of 'and' / 'or' is only evaluated if
the first operand is true / false.
To disambiguate this in the context of other operators, the
"if...else" part in the middle acts like a right-associative
binary operator with a priority lower than that of "or", and
higher than that of "lambda".
Examples of how this works out:
x if C else y if D else z <==> x if C else (y if D else z)
x or y if C else z <==> (x or y) if C else z
x if C else y or z <==> x if C else (y or z)
lambda: x if C else y <==> lambda: (x if C else y)
x if C else lambda: y <==> SyntaxError
x if C else y, z <==> (x if C else y), z
x, y if C else z <==> x, (y if C else z)
Note: a common way to emulate an if-then-else expression is:
<condition> and <expression1> or <expression2>
@ -71,13 +57,22 @@ Proposal
Alternatives
The original version of this PEP proposed the following syntax:
<expression1> if <condition> else <expression2>
The out-of-order arrangement was found to be too uncomfortable
for many of participants in the discussion.
---
Many C-derived languages use this syntax:
<condition> ? <expression1> : <expression2>
Eric Raymond even implemented this. I reject this for several
reasons: the colon already has many uses in Python (even though it
would actually not be ambiguous, because the question mark
Eric Raymond even implemented this. The BDFL rejected this for
several reasons: the colon already has many uses in Python (even
though it would actually not be ambiguous, because the question mark
requires a matching colon); for people not used to C-derived
language, it is hard to understand.
@ -93,49 +88,24 @@ Alternatives
---
If we could live with adding a new keyword, we could use:
Raymond Hettinger proposed a variant that removes the
arbitrariness:
if <condition> then <expression1> else <expression2>
<condition> ?? <expression1> || <expression2>
Apart from the problem of introducing a new keyword for a minor
feature, this also suffers from ambiguity at the start of a
statement; for example:
if verbose then sys.stdout.write("hello\n") else None
could be an syntactically correct expression statement, but starts
with 'if', which makes the parser believe it is the start of an
'if' statement. To resolve this, the syntax would have to require
parentheses, which makes it uglier. However, this form has the
advantage of evaluating strictly from left to right (not that that
is a requirement for being Pythonic -- list comprehensions don't).
---
To deal with the problem of adding a new keyword, this variant has
been proposed:
if <condition> : <expression1> else <expression2>
This has the same ambiguity problem as the previous one (I would
even say more so), and lacks symmetry. It also begs the question
why there isn't a colon after the 'else'. But this:
if <condition> : <expression1> else: <expression2>
is even more confusing because it resembles the if statement so
much. (A solution that *doesn't* resemble the if statement is
better IMO since it should be obvious at first glance whether
we're dealing with an if expression or with an if statement.
Placing the 'if' in the middle somehow satisfies this
requirement.)
The ?? and || are not arbitrary as they strongly suggest testing
and alternation. Another merit is that that existing operators
are not overloaded. Having two characters at each step also
helps visually separate the subordinate expressions. Alas,
the BDFL prefers the proposed syntax and considers this as
alternative number one.
---
Many people suggest adding a new builtin instead of extending the
syntax of the language, e.g.:
ifelse(condition, expression1, expression2)
ifelse(<condition>, <expression1>, <expression2>)
This won't work the way a syntax extension will because both
expression1 and expression2 must be evaluated before the function
@ -143,27 +113,121 @@ Alternatives
evaluation.
Variations
Summary of the Current State of the Discussion
It has been proposed to make the 'else' part optional. This would
be a really bad idea. I showed:
Groups are falling into one of five camps:
x = e if C
1. Adopt a ternary operator built using punctuation characters.
It would look something like:
<condition> ?? <expression1> || <expression2>
to several people. They all thought that if C was false, it would
leave x unchanged. So don't even think about this one!
2. Adopt a ternary operator built using existing keywords.
The proposal listed above is the leading example.
---
3. Adopt a ternary operator built using a new keyword.
The leading contender looks like this:
cond(<condition>, <expression1>, <expression2>)
Another variant proposes to use 'when' instead of 'if':
4. Adopt a function without short-circuit behavior:
cond(<condition>, <expression1>, <expression2>)
<expression1> when <condition> else <expression2>
5. Do nothing.
I don't see the advantage of 'when' over 'if'; it adds a new
keyword which is a major extra hurdle to introduce this. I think
that using a different keyword suggests that the semantics are
different than those of an 'if' statement; but they really aren't
(only the syntax is different).
The first two positions are relatively similar.
Some find that any form of punctuation makes the language more
cryptic. Others find that punctuation style is appropriate
for expressions rather than statements and helps avoid a COBOL
style: 3 plus 4 times 5.
Adapting existing keywords attempts to improve on punctuation
through explicit meaning and a more tidy appearance. The downside
is some loss of the economy-of-expression provided by punctuation
operators. The other downside is that it creates some degree of
confusion between the two meanings and two usages of the keywords.
The third form introduces a new keyword and arranges the arguments
separated by commas. Adding a new keyword is to be generally avoided.
But the form is clear, short, and direct. There is a possible
confusion with function syntax which implies that all the arguments
are evaluated rather than short-circuited. This idea was presented
by the BDFL and should be considered a contender for the final vote.
The exact keyword is still an open question. One proposal was iif(),
but it looks like a typo and can be confused with if-and-only-if
which has a different, well-defined mathematical meaning.
The fourth position is much more conservative. Adding a new
function, cond(), is trivially easy to implement and fits easily
within the existing python model. Users of older versions of
Python will find it trivial to simulate. The downside is that
it does not provide the sought-after short-circuit
evaluation (see the discussion below on the need for this).
The bigger downside is that the BDFL opposes *any* solution that
does not provide short circuit behavior.
The last position is doing nothing. Arguments in favor include
keeping the language simple and concise; maintaining backwards
compatibility; and that any every use cases can already be already
expressed in terms of "if" and "else". Lambda expressions are
an exception as they require the conditional to be factored out
into a separate function definition.
The arguments against doing nothing are that the other choices
allow greater economy of expression and that current practices
show a propensity for erroneous uses of "and", "or", or one their
more complex, visually unappealing workarounds.
It should also be mentioned that most supporters of any of the
first four positions do not want an imperfect solution
and would sooner have no change than create a wart to attain
their desired functionality.
Short-Circuit Behavior
The principal difference between the ternary operator
and the cond() function is that the latter provides an expression
form but does not provide short-circuit evaluation.
Short-circuit evaluation is desirable on three occasions:
1. When an expression has side-effects
2. When one or both of the expressions are resource intensive
3. When the condition serves as a guard for the validity of the
expression.
# Example where all three reasons apply
data = isinstance(source, file) ?? source.readlines()
|| source.split()
1. readlines() moves the file pointer
2. for long sources, both alternatives take time
3. split() is only valid for strings and readlines() is only
valid for file objects.
Supporters of the cond() function point-out that the need for
short-circuit evaluation is rare. Scanning through existing
code directories, they found that if/else did not occur often;
and of those only a few contained expressions that could be
helped by cond() or a ternary operator; and that most of those
had no need for short-circuit evaluation. Hence, cond() would
suffice for most needs and would spare efforts to alter the
syntax of the language.
More supporting evidence comes from scans of C code
bases which show that its ternary operator used very rarely
(as a percentage of lines of code).
A counter point to that analysis is that the availability
of a ternary operator helped the programmer in every case
because it spared the need to search for side-effects.
Further, it would preclude errors arising from distant
modifications which introduce side-effects. The latter case
has become more of a reality with the advent of properties
where even attribute access can be given side-effects.
Still, the point is moot since the BDFL opposes solutions
which do not provide short-circuit behavior.
Copyright