Added guards and walrus patterns (#1644)

This commit is contained in:
Guido van Rossum 2020-10-09 10:27:58 -07:00 committed by GitHub
parent a0a8919aff
commit 1dc309632a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 184 additions and 94 deletions

View File

@ -107,6 +107,19 @@ this: it can only select based on the class.
For a complete example, see
https://github.com/gvanrossum/patma/blob/master/examples/expr.py#L231
Like the Visitor pattern, pattern matching allows for a strict separation
of concerns: specific actions or data processing is independent of the
class hierarchy or manipulated objects. When dealing with predefined or
even built-in classes, in particular, it is often impossible to add further
methods to the individual classes. Pattern matching not only releaves the
programmer or class designer from the burden of the boilerplate code needed
for the Visitor pattern, but is also flexible enough to directly work with
built-in types. It naturally distinguishes between sequences of different
lengths, who might all share the same class despite obviously differing
structures. Moreover, pattern matching automatically takes inheritance
into account: a class *D* inheriting from *C* will be handled by a pattern
that targets *C* by default.
TODO: Could we say more here?
Pattern and functional style
@ -124,7 +137,10 @@ a JSON data structure using ``match``.
TODO: Example code.
Functional programming generally prefers a declarative style with a focus
on relationships in data. Side effects are avoided whenever possible.
Pattern matching thus naturally fits and highly supports functional
programming style.
Rationale
@ -174,7 +190,7 @@ semantic meaning.
Various suggestions have sought to eliminate or avoid the naturally arising
"double indentation" of a case clause's code block. Unfortunately, all such
proposals of *flat indentation schemes* come at the expense of violating
Python's establish structural paradigm, leading to additional syntactic
Python's established structural paradigm, leading to additional syntactic
rules:
- *Unindented case clauses.*
@ -191,7 +207,7 @@ rules:
neither follow the syntactic scheme of simple nor composite statements
but rather establish a category of its own.
- *Putting the expression on a separate line after ``match``.*
- *Putting the expression on a separate line after "match".*
The idea is to use the expression yielding the subject as a statement
to avoid the singularity of ``match`` having no actual block despite
the colons::
@ -220,7 +236,7 @@ PEP, a noticeable improvement in code brevity is observed, more than making
up for the additional indentation level.
*Statement v Expression.* Some suggestions centered around the idea of
*Statement vs. Expression.* Some suggestions centered around the idea of
making ``match`` an expression rather than a statement. However, this
would fit poorly with Python's statement-oriented nature and lead to
unusually long and complex expressions with the need to invent new
@ -239,7 +255,7 @@ The patterns of different case clauses might overlap in that more than
one case clause would match a given subject. The first-to-match rule
ensures that the selection of a case clause for a given subject is
unambiguous. Furthermore, case clauses can have increasingly general
patterns matching wider classes of subjects. The first-to-match rule
patterns matching wider sets of subjects. The first-to-match rule
then ensures that the most precise pattern can be chosen (although it
is the programmer's responsibility to order the case clauses correctly).
@ -249,7 +265,7 @@ This would, however, require that all patterns be purely declarative and
static, running against the established dynamic semantics of Python. The
proposed semantics thus represent a path incorporating the best of both
worlds: patterns are tried in a strictly sequential order so that each
case clause constitutes an actual stement. At the same time, we allow
case clause constitutes an actual statement. At the same time, we allow
the interpreter to cache any information about the subject or change the
order in which subpatterns are tried. In other words: if the interpreter
has found that the subject is not an instance of a class ``C``, it can
@ -268,7 +284,7 @@ would essentially mean that each case clause is a separate function without
direct access to the variables in the surrounding scope (without having to
resort to ``nonlocal`` that is). Moreover, a case clause could no longer
influence any surrounding control flow through standard statement such as
``return`` or ``break``. Hence, such script scoping would lead to
``return`` or ``break``. Hence, such strict scoping would lead to
unintuitive and surprising behavior.
A direct consequence of this is that any variable bindings outlive the
@ -279,6 +295,51 @@ bindings is in line with existing Python structures such as for loops and
with statements.
Guards
~~~~~~
Some constraints cannot be adequately expressed through patterns alone.
For instance, a 'less' or 'greater than' relationship defies the usual
'equal' semantics of patterns. Moreover, different subpatterns are
independent and cannot refer to each other. The addition of _guards_
addresses these restrictions: a guard is an arbitrary expression attached
to a pattern and that must evaluate to ``True`` for the pattern to succeed.
For example, ``case [x, y] if x < y:`` uses a guard (``if x < y``) to
express a 'less than' relationship between two otherwise disjoint capture
patterns ``x`` and ``y``.
From a conceptual point of view, patterns describe structural constraints
on the subject in a declarative style, ideally without any side-effects.
Recall, in particular, that patterns are clearly distinct from expressions,
following different objectives and semantics. Guards then enhance the
patterns in a highly controlled way with arbitrary expressions (that might
have side effects). Splitting the overal pattern into a static structural
and a dynamic 'evaluative' part not only helps with readability, but can
also introduce dramatic potential for compiler optimizations. To keep this
clear separation, guards are only supported on the level of case clauses
and not for individual patterns.
Example using guards::
def sort(seq):
match seq:
case [] | [_]:
return seq
case [x, y] if x <= y:
return seq
case [x, y]:
return [y, x]
case [x, y, z] if x <= y <= z:
return seq
case [x, y, z] if x >= y >= z:
return [z, y, x]
case [p, *rest]:
a = sort([x for x in rest if x <= p])
b = sort([x for x in rest if p < x])
return a + [p] + b
.. _patterns:
Patterns
@ -312,9 +373,37 @@ patterns as declarative elements similar to the formal parameters in a
function definition.
Walrus patterns
~~~~~~~~~~~~~~~
Walrus/AS patterns
~~~~~~~~~~~~~~~~~~
Patterns fall into two categories: most patterns impose a (structural)
constraint that the subject needs to fulfill, whereas the capture pattern
binds the subject to a name without regard for the subject's structure or
actual value. Consequently, a pattern can either express a constraint or
bind a value, but not both. Walrus/AS patterns fill this gap in that they
allow the user to specify a general pattern as well as capture the subject
in a variable.
Typical use cases for the Walrus/AS pattern include OR and Class patterns
together with a binding name as in, e.g., ``case BinOp(op := '+'|'-', ...):``
or ``case [first := int(), second := int()]:``. The latter could be
understood as saying that the subject must fulfil two distinct pattern:
``[first, second]`` as well as ``[int(), int()]``. The Walrus/AS pattern
can thus be seen as a special case of an 'and' pattern (see OR patterns
below for an additional discussion of 'and' patterns).
Example using the Walrus/AS pattern::
def simplify_expr(tokens):
match tokens:
case [l:=('('|'['), *expr, r:=(')'|']')] if (l+r) in ('()', '[]'):
return simplify_expr(expr)
case [0, op:=('+'|'-'), right]:
return UnaryOp(op, right)
case [left:=(int() | float()) | Num(left), '+', right:=(int() | float()) | Num(right)]:
return Num(left + right)
case [value:=(int() | float())]
return Num(value)
OR patterns
@ -366,9 +455,9 @@ OR-patterns to be nested inside other patterns:
in *C*). Also, this would be a novel indentation pattern, which might make
it harder to support in IDEs and such (it would break the simple rule "add
an indentation level after a line ending in a colon"). Finally, this
would not support OR patterns nested inside other patterns.
would not support OR patterns nested inside other patterns, either.
- *Using ``case in`` followed by a comma-separated list*::
- *Using "case in" followed by a comma-separated list*::
case in 401, 403, 404:
print("Some HTTP error")
@ -396,13 +485,14 @@ exactly if the pattern itself does not match. For instance, ``!(3 | 4)``
would match anything except ``3`` or ``4``. However, there is evidence from
other languages that this is rarely useful and primarily used as double
negation ``!!`` to control variable scopes and prevent variable bindings
(which does not apply to Python).
(which does not apply to Python). Other use cases are better expressed using
guards.
In the end, it was decided that this would make the syntax more complex
without adding a significant benefit.
Example::
Example using the OR pattern::
def simplify(expr):
match expr:
@ -415,6 +505,73 @@ Example::
return expr
.. _literal_pattern:
Literal Patterns
~~~~~~~~~~~~~~~~
Literal patterns are a convenient way for imposing constraints on the
value of a subject, rather than its type or structure. Literal patterns
even allow you to emulate a switch statement using pattern matching.
Generally, the subject is compared to a literal pattern by means of standard
equality (``x == y`` in Python syntax). Consequently, the literal patterns
``1.0`` and ``1`` match exactly the same set of objects, i.e. ``case 1.0:``
and ``case 1:`` are fully interchangable. In principle, ``True`` would also
match the same set of objects because ``True == 1`` holds. However, we
believe that many users would be surprised finding that ``case True:``
matched the subject ``1.0``, resulting in some subtle bugs and convoluted
workarounds. We therefore adopted the rule that the three singleton
objects ``None``, ``False`` and ``True`` match by identity (``x is y`` in
Python syntax) rather than equality. Hence, ``case True:`` will match only
``True`` and nothing else. Note that ``case 1:`` would still match ``True``,
though, because the literal pattern ``1`` works by equality and not identity.
Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would
match both the integer ``1`` and the floating point number ``1.0``, whereas
``case 1:`` would only match the integer ``1`` were eventually dropped in
favor of the simpler and consistent rule based on equality. Moreover, any
additional checks whether the subject is an instance of ``numbers.Integral``
would come at a high runtime cost to introduce what would essentially be
novel in Python. When needed, the explicit syntax ``case int(1):`` might
be used.
Recall that literal patterns are *not* expressions, but directly denote a
specific value or object. From a syntactical point of view, we have to
ensure that negative and complex numbers can equally be used as patterns,
although they are not atomic literal values (i.e. the seeming literal value
``-3+4j`` would syntactically be an expression of the form
``BinOp(UnaryOp('-', 3), '+', 4j)``, but as expressions are not part of
patterns, we added syntactic support for such complex value literals without
having to resort to full expressions). Interpolated *f*-strings, on the
other hand, are not literal values, despite their appearance and can
therefore not be used as literal patterns (string concatenation, however,
is supported).
Literal patterns not only occur as patterns in their own right, but also
as keys in *mapping patterns*.
Example using Literal patterns::
def simplify(expr):
match expr:
case ('+', 0, x):
return x
case ('+' | '-', x, 0):
return x
case ('and', True, x):
return x
case ('and', False, x):
return False
case ('or', False, x):
return x
case ('or', True, x):
return True
case ('not', ('not', x)):
return x
return expr
.. _capture_pattern:
Capture Patterns
@ -441,11 +598,11 @@ repeated use of names later on.
There were calls to explicitly mark capture patterns and thus identify them
as binding targets. According to that idea, a capture pattern would be
written as, e.g. ``?x`` or ``$x``. The aim of such explicit capture markers
is to let an unmarked name be a constant value pattern (see below). However,
this is based on the misconception that pattern matching was an extension of
*switch* statements, placing the emphasis on fast switching based on
(ordinal) values. Such a *switch* statement has indeed been proposed for
written as, e.g. ``?x``, ``$x`` or ``=x``. The aim of such explicit capture
markers is to let an unmarked name be a constant value pattern (see below).
However, this is based on the misconception that pattern matching was an
extension of *switch* statements, placing the emphasis on fast switching based
on (ordinal) values. Such a *switch* statement has indeed been proposed for
Python before (see :pep:`275` and :pep:`3103`). Pattern matching, on the other
hand, builds a generalized concept of iterable unpacking. Binding values
extracted from a data structure is at the very core of the concept and hence
@ -454,7 +611,7 @@ betray the objective of the proposed pattern matching syntax and simplify
a secondary use case at the expense of additional syntactic clutter for
core cases.
Example::
Example using Capture patterns::
def average(*args):
match args:
@ -503,7 +660,7 @@ of items is omitted::
case [a, ..., z]: ...
case [a, *, z]: ...
Both look like the would match a sequence of at two or more items,
Both examples look like the would match a sequence of at two or more items,
capturing the first and last values.
A single wildcard clause (i.e. ``case _:``) is semantically equivalent to
@ -523,7 +680,7 @@ readability and learnability. In our view, concerns that this wildcard
means that a regular name received special treatment are not strong
enough to introduce syntax that would make Python special.
Example::
Example using the Wildcard pattern::
def is_closed(sequence):
match sequence:
@ -535,81 +692,14 @@ Example::
return False
.. _literal_pattern:
Literal Patterns
~~~~~~~~~~~~~~~~
Literal patterns are a convenient way for imposing constraints on the
value of a subject, rather than its type or structure. Literal patterns
even allow you to emulate a switch statement using pattern matching.
Generally, the subject is compared to a literal pattern by means of standard
equality (``x == y`` in Python syntax). Consequently, the literal patterns
``1.0`` and ``1`` match exactly the same set of objects, i.e. ``case 1.0:``
and ``case 1:`` are fully interchangable. In principle, ``True`` would also
match the same set of objects because ``True == 1`` holds. However, we
believe that many users would be surprised finding that ``case True:``
matched the object ``1.0``, resulting in some subtle bugs and convoluted
workarounds. We therefore adopted the rule that the three singleton
objects ``None``, ``False`` and ``True`` match by identity (``x is y`` in
Python syntax) rather than equality. Hence, ``case True:`` will match only
``True`` and nothing else. Note that ``case 1:`` would still match ``True``,
though, because the literal pattern ``1`` works by equality and not identity.
Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would
match both the integer ``1`` and the floating point number ``1.0``, whereas
``case 1:`` would only match the integer ``1`` were eventually dropped in
favor of the simpler and consistent rule based on equality. Moreover, any
additional checks whether the subject is an instance of ``numbers.Integral``
would come at a high runtime cost to introduce what would essentially be
novel in Python. When needed, the explicit syntax ``case int(1):`` might
be used.
Recall that literal patterns are *not* expressions, but directly denote a
specific value or object. From a syntactical point of view, we have to
ensure that negative and complex numbers can equally be used as patterns,
although they are not atomic literal values (i.e. the seeming literal value
``-3+4j`` would syntactically be an expression of the form
``BinOp(UnaryOp('-', 3), '+', 4j)``, but as expressions are not part of
patterns, we added syntactic support for such complex value literals without
having to resort to full expressions). Interpolated *f*-strings, on the
other hand, are not literal values, despite their appearance and can
therefore not be used as literal patterns (string concatenation, however,
is supported).
Literal patterns not only occur as patterns in their own right, but also
as keys in *mapping patterns*.
Example::
def simplify(expr):
match expr:
case ('+', 0, x):
return x
case ('+' | '-', x, 0):
return x
case ('and', True, x):
return x
case ('and', False, x):
return False
case ('or', False, x):
return x
case ('or', True, x):
return True
case ('not', ('not', x)):
return x
return expr
.. _constant_value_pattern:
Constant Value Patterns
~~~~~~~~~~~~~~~~~~~~~~~
Value Patterns
~~~~~~~~~~~~~~
It is good programming style to use named constants for parametric values or
to clarify the meaning of particular values. Clearly, it would be desirable
to also write ``case (HttpStatus.OK, body):`` rather than
to write ``case (HttpStatus.OK, body):`` rather than
``case (200, body):``, for example. The main issue that arises here is how to
distinguish capture patterns (variables) from constant value patterns. The
general discussion surrounding this issue has brought forward a plethora of
@ -650,7 +740,7 @@ patterns. Moreover, pattern matching could not be used directly inside a
module's scope because all variables would be global, making capture
patterns impossible.
Example::
Example using the Value pattern::
def handle_reply(reply):
match reply:
@ -752,7 +842,7 @@ understanding of the mapping pattern's semantics.
To avoid overly expensive matching algorithms, keys must be literals or
constant values.
Example::
Example using the Mapping pattern::
def change_red_to_blue(json_obj):
match json_obj:
@ -791,7 +881,7 @@ given in the motivation::
a, b, c = node.left, node.right.left, node.right.right
# Handle a + b*c
The class pattern lets you to concisely specify both an instance-check as
The class pattern lets you concisely specify both an instance-check as
well as relevant attributes (with possible further constraints). It is
thereby very tempting to write, e.g., ``case Node(left, right):`` in the
first case above and ``case Leaf(value):`` in the second. While this