python-peps/pep-0308.txt

PEP: 308
Title: If-then-else expression
Version: $Revision$
Last-Modified: $Date$
Author: Guido van Rossum and Raymond D. Hettinger
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 7-Feb-2003
Post-History: 7-Feb-2003, 11-Feb-2003


Introduction

    Requests for an if-then-else ("ternary") expression keep coming up
    on comp.lang.python.  This PEP contains a concrete proposal of a
    fairly Pythonic syntax.  This is the community's one chance: if
    this PEP is approved with a clear majority, it will be implemented
    in Python 2.4.  If not, the PEP will be augmented with a summary
    of the reasons for rejection and the subject better not come up
    again.  While I am the author of this PEP, I am neither in favor
    nor against this proposal; it is up to the community to decide.
    If the community can't decide, I'll reject the PEP.

    After unprecedented community response (very food arguments were
    made both pro and con) this PEP has been revised with the help of
    Raymond Hettinger.  Without going through a complete revision
    history, the main changes are a different proposed syntax, an
    overview of proposed alternatives, the state of the curent
    discussion, and a discussion of short-circuit behavior.


Proposal

    The proposed syntax is as follows:

	(if <condition>: <expression1> else: <expression2>) 


    This is evaluated like this:

    - First, <condition> is evaluated.

    - If <condition> is true, <expression1> is evaluated and is the
      result of the whole thing.

    - If <condition> is false, <expression2> is evaluated and is the
      result of the whole thing.

    Note that at most one of <expression1> and <expression2> is
    evaluated.  This is called a "short-circuit expression"; it is
    similar to the way the second operand of 'and' / 'or' is only
    evaluated if the first operand is true / false.

    A common way to emulate an if-then-else expression is:

        <condition> and <expression1> or <expression2>

    However, this doesn't work the same way: it returns <expression2>
    when <expression1> is false!  See FAQ 4.16 for alternatives that
    work -- however, they are pretty ugly and require much more effort
    to understand.


Alternatives

    The original version of this PEP proposed the following syntax:

        <expression1> if <condition> else <expression2>

    The out-of-order arrangement was found to be too uncomfortable
    for many of participants in the discussion; especially when
    <expression1> is long, it's easy to miss the conditional while
    skimming.

    ---

    Many C-derived languages use this syntax:

        <condition> ? <expression1> : <expression2>

    Eric Raymond even implemented this.  The BDFL rejected this for
    several reasons: the colon already has many uses in Python (even
    though it would actually not be ambiguous, because the question
    mark requires a matching colon); for people not used to C-derived
    language, it is hard to understand.

    ---

    David Ascher proposed a variant that doesn't have this problem:

        <condition> ? <expression1> ! <expression2>

    While cute, this suffers from the Perlish problem of using
    arbitrary punctuation with an arbitrary meaning; and it's no
    easier to understand than the ?: form.

    ---

    Raymond Hettinger proposed a variant that removes the
    arbitrariness:

            <condition> ?? <expression1> || <expression2>

    The ?? and || are not arbitrary as they strongly suggest testing
    and alternation.  Another merit is that that existing operators
    are not overloaded.  Having two characters at each step also helps
    visually separate the subordinate expressions.  Alas, the BDFL
    prefers the proposed syntax and considers this alternative "too
    Perlish".

    ---

    Many people suggest adding a new builtin instead of extending the
    syntax of the language, e.g.:

        cond(<condition>, <expression1>, <expression2>)

    This won't work the way a syntax extension will because both
    expression1 and expression2 must be evaluated before the function
    is called.  There's no way to short-circuit the expression
    evaluation.  It could work if 'cond' (or some other name) were
    made a keyword, but that has all the disadvantages of adding a new
    keyword, plus confusing syntax: it *looks* like a function call so
    a casual reader might expect both <expression1> and <expression2>
    to be evaluated.


Summary of the Current State of the Discussion

    Groups are falling into one of five camps:

    1.  Adopt a ternary operator built using punctuation characters.
	It would look something like:
	    <condition> ?? <expression1>  || <expression2>

    2.  Adopt a ternary operator built using existing keywords.
	The proposal listed above is the leading example.

    3.  Adopt a ternary operator built using a new keyword.
        The leading contender looks like this:
	    cond(<condition>, <expression1>, <expression2>)							 

    4.  Adopt a function without short-circuit behavior:
            cond(<condition>, <expression1>, <expression2>) 

    5.  Do nothing.

    The first two positions are relatively similar.

    Some find that any form of punctuation makes the language more
    cryptic.  Others find that punctuation style is appropriate for
    expressions rather than statements and helps avoid a COBOL style:
    3 plus 4 times 5.

    Adapting existing keywords attempts to improve on punctuation
    through explicit meaning and a more tidy appearance.  The downside
    is some loss of the economy-of-expression provided by punctuation
    operators.  The other downside is that it creates some degree of
    confusion between the two meanings and two usages of the keywords.

    The third form introduces a new keyword and arranges the arguments
    separated by commas.  Adding a new keyword is to be generally
    avoided.  But the form is clear, short, and direct.  There is a
    possible confusion with function syntax which implies that all the
    arguments are evaluated rather than short-circuited.  This idea
    was presented by the BDFL and should be considered a contender for
    the final vote.  The exact keyword is still an open question.  One
    proposal was iif(), but it looks like a typo and can be confused
    with if-and-only-if which has a different, well-defined
    mathematical meaning.

    The fourth position is much more conservative.  Adding a new
    function, cond(), is trivially easy to implement and fits easily
    within the existing python model.  Users of older versions of
    Python will find it trivial to simulate.  The downside is that it
    does not provide the sought-after short-circuit evaluation (see
    the discussion below on the need for this).  The bigger downside
    is that the BDFL opposes *any* solution that does not provide
    short circuit behavior.

    The last position is doing nothing.  Arguments in favor include
    keeping the language simple and concise; maintaining backwards
    compatibility; and that any every use cases can already be already
    expressed in terms of "if" and "else".  Lambda expressions are an
    exception as they require the conditional to be factored out into
    a separate function definition.

    The arguments against doing nothing are that the other choices
    allow greater economy of expression and that current practices
    show a propensity for erroneous uses of "and", "or", or one their
    more complex, visually unappealing workarounds.

    It should also be mentioned that most supporters of any of the
    first four positions do not want an imperfect solution and would
    sooner have no change than create a wart to attain their desired
    functionality.


Short-Circuit Behavior

    The principal difference between the ternary operator and the
    cond() function is that the latter provides an expression form but
    does not provide short-circuit evaluation.

    Short-circuit evaluation is desirable on three occasions:
                                                         
    1. When an expression has side-effects
    2. When one or both of the expressions are resource intensive
    3. When the condition serves as a guard for the validity of the
      expression.

    #  Example where all three reasons apply
    data = isinstance(source, file)  ??  source.readlines()
                                     ||  source.split()

    1. readlines() moves the file pointer
    2. for long sources, both alternatives take time
    3. split() is only valid for strings and readlines() is only
       valid for file objects.

    Supporters of the cond() function point-out that the need for
    short-circuit evaluation is rare.  Scanning through existing code
    directories, they found that if/else did not occur often; and of
    those only a few contained expressions that could be helped by
    cond() or a ternary operator; and that most of those had no need
    for short-circuit evaluation.  Hence, cond() would suffice for
    most needs and would spare efforts to alter the syntax of the
    language.

    More supporting evidence comes from scans of C code bases which
    show that its ternary operator used very rarely (as a percentage
    of lines of code).

    A counter point to that analysis is that the availability of a
    ternary operator helped the programmer in every case because it
    spared the need to search for side-effects.  Further, it would
    preclude errors arising from distant modifications which introduce
    side-effects.  The latter case has become more of a reality with
    the advent of properties where even attribute access can be given
    side-effects.

    The BDFL's position is that short-circuit behavior is essential
    for an if-then-else construct to be added to the language.


Copyright

    This document has been placed in the public domain.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End: