1071 lines
47 KiB
ReStructuredText
1071 lines
47 KiB
ReStructuredText
PEP: 635
|
|
Title: Structural Pattern Matching: Motivation and Rationale
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Tobias Kohn <kohnt@tobiaskohn.ch>,
|
|
Guido van Rossum <guido@python.org>
|
|
BDFL-Delegate:
|
|
Discussions-To: Python-Dev <python-dev@python.org>
|
|
Status: Draft
|
|
Type: Informational
|
|
Content-Type: text/x-rst
|
|
Created: 12-Sep-2020
|
|
Python-Version: 3.10
|
|
Post-History:
|
|
Resolution:
|
|
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
**NOTE:** This draft is incomplete and not intended for review yet.
|
|
We're checking it into the peps repo for the convenience of the authors.
|
|
|
|
This PEP provides the motivation and rationale for PEP 634
|
|
("Structural Pattern Matching: Specification"). First-time readers
|
|
are encouraged to start with PEP 636, which provides a gentler
|
|
introduction to the concepts, syntax and semantics of patterns.
|
|
|
|
TODO: Go over the feedback from the SC and make sure everything's
|
|
somehow addressed.
|
|
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
(Structural) pattern matching syntax is found in many languages, from
|
|
Haskell, Erlang and Scala to Elixir and Ruby. (A proposal for
|
|
JavaScript is also under consideration.)
|
|
|
|
Python already supports a limited form of this through sequence
|
|
unpacking assignments, which the new proposal leverages.
|
|
|
|
Several other common Python idioms are also relevant:
|
|
|
|
- The ``if ... elif ... elif ... else`` idiom is often used to find
|
|
out the type or shape of an object in an ad-hoc fashion, using one
|
|
or more checks like ``isinstance(x, cls)``, ``hasattr(x, "attr")``,
|
|
``len(x) == n`` or ``"key" in x`` as guards to select an applicable
|
|
block. The block can then assume ``x`` supports the interface
|
|
checked by the guard. For example::
|
|
|
|
if isinstance(x, tuple) and len(x) == 2:
|
|
host, port = x
|
|
mode = "http"
|
|
elif isinstance(x, tuple) and len(x) == 3:
|
|
host, port, mode = x
|
|
# Etc.
|
|
|
|
Code like this is more elegantly rendered using ``match``::
|
|
|
|
match x:
|
|
case host, port:
|
|
mode = "http"
|
|
case host, port, mode:
|
|
pass
|
|
# Etc.
|
|
|
|
- AST traversal code often looks for nodes matching a given pattern,
|
|
for example the code to detect a node of the shape "A + B * C" might
|
|
look like this::
|
|
|
|
if (isinstance(node, BinOp) and node.op == "+"
|
|
and isinstance(node.right, BinOp) and node.right.op == "*"):
|
|
a, b, c = node.left, node.right.left, node.right.right
|
|
# Handle a + b*c
|
|
|
|
Using ``match`` this becomes more readable::
|
|
|
|
match node:
|
|
case BinOp("+", a, BinOp("*", b, c)):
|
|
# Handle a + b*c
|
|
|
|
- TODO: Other compelling examples?
|
|
|
|
We believe that adding pattern matching to Python will enable Python
|
|
users to write cleaner, more readable code for examples like those
|
|
above, and many others.
|
|
|
|
Pattern matching and OO
|
|
-----------------------
|
|
|
|
Pattern matching is complimentary to the object-oriented paradigm.
|
|
Using OO and inheritance we can easily define a method on a base class
|
|
that defines default behavior for a specific operation on that class,
|
|
and we can override this default behavior in subclasses. We can also
|
|
use the Visitor pattern to separate actions from data.
|
|
|
|
But this is not sufficient for all situations. For example, a code
|
|
generator may consume an AST, and have many operations where the
|
|
generated code needs to vary based not just on the class of a node,
|
|
but also on the value of some class attributes, like the ``BinOp``
|
|
example above. The Visitor pattern is insufficiently flexible for
|
|
this: it can only select based on the class.
|
|
|
|
For a complete example, see
|
|
https://github.com/gvanrossum/patma/blob/master/examples/expr.py#L231
|
|
|
|
Like the Visitor pattern, pattern matching allows for a strict separation
|
|
of concerns: specific actions or data processing is independent of the
|
|
class hierarchy or manipulated objects. When dealing with predefined or
|
|
even built-in classes, in particular, it is often impossible to add further
|
|
methods to the individual classes. Pattern matching not only releaves the
|
|
programmer or class designer from the burden of the boilerplate code needed
|
|
for the Visitor pattern, but is also flexible enough to directly work with
|
|
built-in types. It naturally distinguishes between sequences of different
|
|
lengths, who might all share the same class despite obviously differing
|
|
structures. Moreover, pattern matching automatically takes inheritance
|
|
into account: a class *D* inheriting from *C* will be handled by a pattern
|
|
that targets *C* by default.
|
|
|
|
TODO: Could we say more here?
|
|
|
|
Pattern and functional style
|
|
----------------------------
|
|
|
|
Most Python applications and libraries are not written in a consistent
|
|
OO style -- unlike Java, Python encourages defining functions at the
|
|
top-level of a module, and for simple data structures, tuples (or
|
|
named tuples or lists) and dictionaries are often used exclusively or
|
|
mixed with classes or data classes.
|
|
|
|
Pattern matching is particularly suitable for picking apart such data
|
|
structures. As an extreme example, it's easy to write code that picks
|
|
a JSON data structure using ``match``.
|
|
|
|
TODO: Example code.
|
|
|
|
Functional programming generally prefers a declarative style with a focus
|
|
on relationships in data. Side effects are avoided whenever possible.
|
|
Pattern matching thus naturally fits and highly supports functional
|
|
programming style.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
TBD.
|
|
|
|
This section should provide the rationale for individual design decisions.
|
|
It takes the place of "Rejected ideas" in the standard PEP format.
|
|
It is organized in sections corresponding to the specification (PEP 634).
|
|
|
|
|
|
Overview and terminology
|
|
------------------------
|
|
|
|
|
|
|
|
The ``match`` statement
|
|
-----------------------
|
|
|
|
The match statement evaluates an expression to produce a subject, finds the
|
|
first pattern that matches the subject and executes the associated block
|
|
of code. Syntactically, the match statement thus takes an expression and
|
|
a sequence of case clauses, where each case clause comprises a pattern and
|
|
a block of code.
|
|
|
|
Since case clauses comprise a block of code, they adhere to the existing
|
|
indentation scheme with the syntactic structure of
|
|
``<keyword> ...: <(indented) block>``, which in turn makes it a (compound)
|
|
statement. The chosen keyword ``case`` reflects its widespread use in
|
|
pattern matching languages, ignoring those languages that use other
|
|
syntactic means such as a symbol like ``|`` because it would not fit
|
|
established Python structures. The syntax of patterns following the
|
|
keyword is discussed below.
|
|
|
|
Given that the case clauses follow the structure of a compound statement,
|
|
the match statement itself naturally becomes a compoung statement itself
|
|
as well, following the same syntactic structure. This naturally leads to
|
|
``match <expr>: <case_clause>+``. Note that the match statement determines
|
|
a quasi-scope in which the evaluated subject is kept alive (although not in
|
|
a local variable), similar to how a with statement might keep a resource
|
|
alive during execution of its block. Furthermore, control flows from the
|
|
match statement to a case clause and then leaves the block of the match
|
|
statement. The block of the match statement thus has both syntactic and
|
|
semantic meaning.
|
|
|
|
Various suggestions have sought to eliminate or avoid the naturally arising
|
|
"double indentation" of a case clause's code block. Unfortunately, all such
|
|
proposals of *flat indentation schemes* come at the expense of violating
|
|
Python's established structural paradigm, leading to additional syntactic
|
|
rules:
|
|
|
|
- *Unindented case clauses.*
|
|
The idea is to align case clauses with the ``match``, i.e.::
|
|
|
|
match expression:
|
|
case pattern_1:
|
|
...
|
|
case pattern_2:
|
|
...
|
|
|
|
This may look awkward to the eye of a Python programmer, because
|
|
everywhere else colon is followed by an indent. The ``match`` would
|
|
neither follow the syntactic scheme of simple nor composite statements
|
|
but rather establish a category of its own.
|
|
|
|
- *Putting the expression on a separate line after "match".*
|
|
The idea is to use the expression yielding the subject as a statement
|
|
to avoid the singularity of ``match`` having no actual block despite
|
|
the colons::
|
|
|
|
match:
|
|
expression
|
|
case pattern_1:
|
|
...
|
|
case pattern_2:
|
|
...
|
|
|
|
This was ultimately rejected because the first block would be another
|
|
novelty in Python's grammar: a block whose only content is a single
|
|
expression rather than a sequence of statements. Attempts to amend this
|
|
issue by adding or repurposing yet another keyword along the lines of
|
|
``match: return expression`` did not yield any satisfactory solution.
|
|
|
|
Although flat indentation would save some horizontal space, the cost of
|
|
increased complexity or unusual rules is too high. It would also complicate
|
|
life for simple-minded code editors. Finally, the horizontal space issue can
|
|
be alleviated by allowing "half-indent" (i.e. two spaces instead of four)
|
|
for match statements.
|
|
|
|
In sample programs using match, written as part of the development of this
|
|
PEP, a noticeable improvement in code brevity is observed, more than making
|
|
up for the additional indentation level.
|
|
|
|
|
|
*Statement vs. Expression.* Some suggestions centered around the idea of
|
|
making ``match`` an expression rather than a statement. However, this
|
|
would fit poorly with Python's statement-oriented nature and lead to
|
|
unusually long and complex expressions with the need to invent new
|
|
syntactic constructs or break well established syntactic rules. An
|
|
obvious consequence of ``match`` as an expression would be that case
|
|
clauses could no longer have abitrary blocks of code attached, but only
|
|
a single expression. Overall, the strong limitations could in no way
|
|
offset the slight simplification in some special use cases.
|
|
|
|
|
|
|
|
Match semantics
|
|
~~~~~~~~~~~~~~~
|
|
|
|
The patterns of different case clauses might overlap in that more than
|
|
one case clause would match a given subject. The first-to-match rule
|
|
ensures that the selection of a case clause for a given subject is
|
|
unambiguous. Furthermore, case clauses can have increasingly general
|
|
patterns matching wider sets of subjects. The first-to-match rule
|
|
then ensures that the most precise pattern can be chosen (although it
|
|
is the programmer's responsibility to order the case clauses correctly).
|
|
|
|
In a statically typed language, the match statement would be compiled to
|
|
a decision tree to select a matching pattern quickly and very efficiently.
|
|
This would, however, require that all patterns be purely declarative and
|
|
static, running against the established dynamic semantics of Python. The
|
|
proposed semantics thus represent a path incorporating the best of both
|
|
worlds: patterns are tried in a strictly sequential order so that each
|
|
case clause constitutes an actual statement. At the same time, we allow
|
|
the interpreter to cache any information about the subject or change the
|
|
order in which subpatterns are tried. In other words: if the interpreter
|
|
has found that the subject is not an instance of a class ``C``, it can
|
|
directly skip case clauses testing for this again, without having to
|
|
perform repeated instance-checks. If a guard stipulates that a variable
|
|
``x`` must be positive, say (i.e. ``if x > 0``), the interpreter might
|
|
check this directly after binding ``x`` and before any further
|
|
subpatterns are considered.
|
|
|
|
|
|
*Binding and scoping.* In many pattern matching implementations, each
|
|
case clause would establish a separate scope of its own. Variables bound
|
|
by a pattern would then only be visible inside the corresponding case block.
|
|
In Python, however, this does not make sense. Establishing separate scopes
|
|
would essentially mean that each case clause is a separate function without
|
|
direct access to the variables in the surrounding scope (without having to
|
|
resort to ``nonlocal`` that is). Moreover, a case clause could no longer
|
|
influence any surrounding control flow through standard statement such as
|
|
``return`` or ``break``. Hence, such strict scoping would lead to
|
|
unintuitive and surprising behavior.
|
|
|
|
A direct consequence of this is that any variable bindings outlive the
|
|
respective case or match statements. Even patterns that only match a
|
|
subject partially might bind local variables (this is, in fact, necessary
|
|
for guards to function properly). However, this escaping of variable
|
|
bindings is in line with existing Python structures such as for loops and
|
|
with statements.
|
|
|
|
|
|
Guards
|
|
~~~~~~
|
|
|
|
Some constraints cannot be adequately expressed through patterns alone.
|
|
For instance, a 'less' or 'greater than' relationship defies the usual
|
|
'equal' semantics of patterns. Moreover, different subpatterns are
|
|
independent and cannot refer to each other. The addition of _guards_
|
|
addresses these restrictions: a guard is an arbitrary expression attached
|
|
to a pattern and that must evaluate to ``True`` for the pattern to succeed.
|
|
|
|
For example, ``case [x, y] if x < y:`` uses a guard (``if x < y``) to
|
|
express a 'less than' relationship between two otherwise disjoint capture
|
|
patterns ``x`` and ``y``.
|
|
|
|
From a conceptual point of view, patterns describe structural constraints
|
|
on the subject in a declarative style, ideally without any side-effects.
|
|
Recall, in particular, that patterns are clearly distinct from expressions,
|
|
following different objectives and semantics. Guards then enhance the
|
|
patterns in a highly controlled way with arbitrary expressions (that might
|
|
have side effects). Splitting the overal pattern into a static structural
|
|
and a dynamic 'evaluative' part not only helps with readability, but can
|
|
also introduce dramatic potential for compiler optimizations. To keep this
|
|
clear separation, guards are only supported on the level of case clauses
|
|
and not for individual patterns.
|
|
|
|
Example using guards::
|
|
|
|
def sort(seq):
|
|
match seq:
|
|
case [] | [_]:
|
|
return seq
|
|
case [x, y] if x <= y:
|
|
return seq
|
|
case [x, y]:
|
|
return [y, x]
|
|
case [x, y, z] if x <= y <= z:
|
|
return seq
|
|
case [x, y, z] if x >= y >= z:
|
|
return [z, y, x]
|
|
case [p, *rest]:
|
|
a = sort([x for x in rest if x <= p])
|
|
b = sort([x for x in rest if p < x])
|
|
return a + [p] + b
|
|
|
|
|
|
.. _patterns:
|
|
|
|
Patterns
|
|
--------
|
|
|
|
Patterns fulfill two purposes: they impose (structural) constraints on
|
|
the subject and they specify which data values should be extracted from
|
|
the subject and bound to variables. In iterable unpacking, which can be
|
|
seen as a prototype to pattern matching in Python, there is only one
|
|
*structural pattern* to express sequences while there is a rich set of
|
|
*binding patterns* to assign a value to a specific variable or field.
|
|
Full pattern matching differs from this in that there is more variety
|
|
in structual patterns but only a minimum of binding patterns.
|
|
|
|
Patterns differ from assignment targets (as in iterable unpacking) in that
|
|
they impose additional constraints on the structure of the subject and in
|
|
that a subject might safely fail to match a specific pattern at any point
|
|
(in iterable unpacking, this constitutes an error). The latter means that
|
|
pattern should avoid side effects wherever possible, including binding
|
|
values to attributes or subscripts.
|
|
|
|
A cornerstone of pattern matching is the possibility of arbitrarily
|
|
*nesting patterns*. The nesting allows for expressing deep
|
|
tree structures (for an example of nested class patterns, see the motivation
|
|
section above) as well as alternatives.
|
|
|
|
Although the structural patterns might superficially look like expressions,
|
|
it is important to keep in mind that there is a clear distinction. In fact,
|
|
no pattern is or contains an expression. It is more productive to think of
|
|
patterns as declarative elements similar to the formal parameters in a
|
|
function definition.
|
|
|
|
|
|
Walrus/AS patterns
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
Patterns fall into two categories: most patterns impose a (structural)
|
|
constraint that the subject needs to fulfill, whereas the capture pattern
|
|
binds the subject to a name without regard for the subject's structure or
|
|
actual value. Consequently, a pattern can either express a constraint or
|
|
bind a value, but not both. Walrus/AS patterns fill this gap in that they
|
|
allow the user to specify a general pattern as well as capture the subject
|
|
in a variable.
|
|
|
|
Typical use cases for the Walrus/AS pattern include OR and Class patterns
|
|
together with a binding name as in, e.g., ``case BinOp(op := '+'|'-', ...):``
|
|
or ``case [first := int(), second := int()]:``. The latter could be
|
|
understood as saying that the subject must fulfil two distinct pattern:
|
|
``[first, second]`` as well as ``[int(), int()]``. The Walrus/AS pattern
|
|
can thus be seen as a special case of an 'and' pattern (see OR patterns
|
|
below for an additional discussion of 'and' patterns).
|
|
|
|
Example using the Walrus/AS pattern::
|
|
|
|
def simplify_expr(tokens):
|
|
match tokens:
|
|
case [l:=('('|'['), *expr, r:=(')'|']')] if (l+r) in ('()', '[]'):
|
|
return simplify_expr(expr)
|
|
case [0, op:=('+'|'-'), right]:
|
|
return UnaryOp(op, right)
|
|
case [left:=(int() | float()) | Num(left), '+', right:=(int() | float()) | Num(right)]:
|
|
return Num(left + right)
|
|
case [value:=(int() | float())]
|
|
return Num(value)
|
|
|
|
|
|
OR patterns
|
|
~~~~~~~~~~~
|
|
|
|
The OR pattern allows you to combine 'structurally equivalent' alternatives
|
|
into a new pattern, i.e. several patterns can share a common handler. If any
|
|
one of an OR pattern's subpatterns matches the given subject, the entire OR
|
|
pattern succeeds.
|
|
|
|
Statically typed languages prohibit the binding of names (capture patterns)
|
|
inside an OR pattern because of potential conflicts concerning the types of
|
|
variables. As a dynamically typed language, Python can be less restrictive
|
|
here and allow capture patterns inside OR patterns. However, each subpattern
|
|
must bind the same set of variables so as not to leave potentially undefined
|
|
names. With two alternatives ``P | Q``, this means that if *P* binds the
|
|
variables *u* and *v*, *Q* must bind exactly the same variables *u* and *v*.
|
|
|
|
There was some discussion on whether to use the bar ``|`` or the keyword
|
|
``or`` in order to separate alternatives. The OR pattern does not fully fit
|
|
the existing semantics and usage of either of these two symbols. However,
|
|
``|`` is the symbol of choice in all programming languages with support of
|
|
the OR pattern and is even used in that capacity for regular expressions in
|
|
Python as well. Moreover, ``|`` is not only used for bitwise OR, but also
|
|
for set unions and dict merging (:pep:`584`).
|
|
Other alternatives were considered as well, but none of these would allow
|
|
OR-patterns to be nested inside other patterns:
|
|
|
|
- *Using a comma*::
|
|
|
|
case 401, 403, 404:
|
|
print("Some HTTP error")
|
|
|
|
This looks too much like a tuple -- we would have to find a different way
|
|
to spell tuples, and the construct would have to be parenthesized inside
|
|
the argument list of a class pattern. In general, commas already have many
|
|
different meanings in Python, we shouldn't add more.
|
|
|
|
- *Using stacked cases*::
|
|
|
|
case 401:
|
|
case 403:
|
|
case 404:
|
|
print("Some HTTP error")
|
|
|
|
This is how this would be done in *C*, using its fall-through semantics
|
|
for cases. However, we don't want to mislead people into thinking that
|
|
match/case uses fall-through semantics (which are a common source of bugs
|
|
in *C*). Also, this would be a novel indentation pattern, which might make
|
|
it harder to support in IDEs and such (it would break the simple rule "add
|
|
an indentation level after a line ending in a colon"). Finally, this
|
|
would not support OR patterns nested inside other patterns, either.
|
|
|
|
- *Using "case in" followed by a comma-separated list*::
|
|
|
|
case in 401, 403, 404:
|
|
print("Some HTTP error")
|
|
|
|
This would not work for OR patterns nested inside other patterns, like::
|
|
|
|
case Point(0|1, 0|1):
|
|
print("A corner of the unit square")
|
|
|
|
|
|
*AND and NOT patterns.*
|
|
This proposal defines an OR-pattern (|) to match one of several alternates;
|
|
why not also an AND-pattern (``&``) or even a NOT-pattern (``!``)?
|
|
Especially given that some other languages (``F#`` for example) support
|
|
AND-patterns.
|
|
|
|
However, it is not clear how useful this would be. The semantics for matching
|
|
dictionaries, objects and sequences already incorporates an implicit 'and':
|
|
all attributes and elements mentioned must be present for the match to
|
|
succeed. Guard conditions can also support many of the use cases that a
|
|
hypothetical 'and' operator would be used for.
|
|
|
|
A negation of a match pattern using the operator ``!`` as a prefix would match
|
|
exactly if the pattern itself does not match. For instance, ``!(3 | 4)``
|
|
would match anything except ``3`` or ``4``. However, there is evidence from
|
|
other languages that this is rarely useful and primarily used as double
|
|
negation ``!!`` to control variable scopes and prevent variable bindings
|
|
(which does not apply to Python). Other use cases are better expressed using
|
|
guards.
|
|
|
|
In the end, it was decided that this would make the syntax more complex
|
|
without adding a significant benefit.
|
|
|
|
|
|
Example using the OR pattern::
|
|
|
|
def simplify(expr):
|
|
match expr:
|
|
case ('/', 0, 0):
|
|
return expr
|
|
case ('*' | '/', 0, _):
|
|
return 0
|
|
case ('+' | '-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*' | '/', x, 1):
|
|
return x
|
|
return expr
|
|
|
|
|
|
.. _literal_pattern:
|
|
|
|
Literal Patterns
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Literal patterns are a convenient way for imposing constraints on the
|
|
value of a subject, rather than its type or structure. Literal patterns
|
|
even allow you to emulate a switch statement using pattern matching.
|
|
|
|
Generally, the subject is compared to a literal pattern by means of standard
|
|
equality (``x == y`` in Python syntax). Consequently, the literal patterns
|
|
``1.0`` and ``1`` match exactly the same set of objects, i.e. ``case 1.0:``
|
|
and ``case 1:`` are fully interchangable. In principle, ``True`` would also
|
|
match the same set of objects because ``True == 1`` holds. However, we
|
|
believe that many users would be surprised finding that ``case True:``
|
|
matched the subject ``1.0``, resulting in some subtle bugs and convoluted
|
|
workarounds. We therefore adopted the rule that the three singleton
|
|
objects ``None``, ``False`` and ``True`` match by identity (``x is y`` in
|
|
Python syntax) rather than equality. Hence, ``case True:`` will match only
|
|
``True`` and nothing else. Note that ``case 1:`` would still match ``True``,
|
|
though, because the literal pattern ``1`` works by equality and not identity.
|
|
|
|
Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would
|
|
match both the integer ``1`` and the floating point number ``1.0``, whereas
|
|
``case 1:`` would only match the integer ``1`` were eventually dropped in
|
|
favor of the simpler and consistent rule based on equality. Moreover, any
|
|
additional checks whether the subject is an instance of ``numbers.Integral``
|
|
would come at a high runtime cost to introduce what would essentially be
|
|
novel in Python. When needed, the explicit syntax ``case int(1):`` might
|
|
be used.
|
|
|
|
Recall that literal patterns are *not* expressions, but directly denote a
|
|
specific value or object. From a syntactical point of view, we have to
|
|
ensure that negative and complex numbers can equally be used as patterns,
|
|
although they are not atomic literal values (i.e. the seeming literal value
|
|
``-3+4j`` would syntactically be an expression of the form
|
|
``BinOp(UnaryOp('-', 3), '+', 4j)``, but as expressions are not part of
|
|
patterns, we added syntactic support for such complex value literals without
|
|
having to resort to full expressions). Interpolated *f*-strings, on the
|
|
other hand, are not literal values, despite their appearance and can
|
|
therefore not be used as literal patterns (string concatenation, however,
|
|
is supported).
|
|
|
|
Literal patterns not only occur as patterns in their own right, but also
|
|
as keys in *mapping patterns*.
|
|
|
|
Example using Literal patterns::
|
|
|
|
def simplify(expr):
|
|
match expr:
|
|
case ('+', 0, x):
|
|
return x
|
|
case ('+' | '-', x, 0):
|
|
return x
|
|
case ('and', True, x):
|
|
return x
|
|
case ('and', False, x):
|
|
return False
|
|
case ('or', False, x):
|
|
return x
|
|
case ('or', True, x):
|
|
return True
|
|
case ('not', ('not', x)):
|
|
return x
|
|
return expr
|
|
|
|
|
|
.. _capture_pattern:
|
|
|
|
Capture Patterns
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Capture patterns take on the form of a name that accepts any value and binds
|
|
it to a (local) variable (unless the name is declared as ``nonlocal`` or
|
|
``global``). In that sense, a simple capture pattern is basically equivalent
|
|
to a parameter in a function definition (when the function is called, each
|
|
parameter binds the respective argument to a local variable in the function's
|
|
scope).
|
|
|
|
A name used for a capture pattern must not coincide with another capture
|
|
pattern in the same pattern. This, again, is similar to parameters, which
|
|
equally require each parameter name to be unique within the list of
|
|
parameters. It differs, however, from iterable unpacking assignment, where
|
|
the repeated use of a variable name as target is permissible (e.g.,
|
|
``x, x = 1, 2``). The rationale for not supporting ``(x, x)`` in patterns
|
|
is its ambiguous reading: it could be seen as in iterable unpacking where
|
|
only the second binding to ``x`` survives. But it could be equally seen as
|
|
expressing a tuple with two equal elements (which comes with its own issues).
|
|
Should the need arise, then it is still possible to introduce support for
|
|
repeated use of names later on.
|
|
|
|
There were calls to explicitly mark capture patterns and thus identify them
|
|
as binding targets. According to that idea, a capture pattern would be
|
|
written as, e.g. ``?x``, ``$x`` or ``=x``. The aim of such explicit capture
|
|
markers is to let an unmarked name be a constant value pattern (see below).
|
|
However, this is based on the misconception that pattern matching was an
|
|
extension of *switch* statements, placing the emphasis on fast switching based
|
|
on (ordinal) values. Such a *switch* statement has indeed been proposed for
|
|
Python before (see :pep:`275` and :pep:`3103`). Pattern matching, on the other
|
|
hand, builds a generalized concept of iterable unpacking. Binding values
|
|
extracted from a data structure is at the very core of the concept and hence
|
|
the most common use case. Explicit markers for capture patterns would thus
|
|
betray the objective of the proposed pattern matching syntax and simplify
|
|
a secondary use case at the expense of additional syntactic clutter for
|
|
core cases.
|
|
|
|
Example using Capture patterns::
|
|
|
|
def average(*args):
|
|
match args:
|
|
case [x, y]: # captures the two elements of a sequence
|
|
return (x + y) / 2
|
|
case [x]: # captures the only element of a sequence
|
|
return x
|
|
case []:
|
|
return 0
|
|
case x: # captures the entire sequence
|
|
return sum(x) / len(x)
|
|
|
|
|
|
.. _wildcard_pattern:
|
|
|
|
Wildcard Pattern
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
The wildcard pattern is a special case of a 'capture' pattern: it accepts
|
|
any value, but does not bind it to a variable. The idea behind this rule
|
|
is to support repeated use of the wildcard in patterns. While ``(x, x)``
|
|
is an error, ``(_, _)`` is legal.
|
|
|
|
Particularly in larger (sequence) patterns, it is important to allow the
|
|
pattern to concentrate on values with actual significance while ignoring
|
|
anything else. Without a wildcard, it would become necessary to 'invent'
|
|
a number of local variables, which would be bound but never used. Even
|
|
when sticking to naming conventions and using e.g. ``_1, _2, _3`` to name
|
|
irrelevant values, say, this still introduces visual clutter and can hurt
|
|
performance (compare the sequence pattern ``(x, y, *z)`` to ``(_, y, *_)``,
|
|
where the ``*z`` forces the interpreter to copy a potentially very long
|
|
sequence, whereas the second version simply compiles to code along the
|
|
lines of ``y = seq[1]``).
|
|
|
|
There has been much discussion about the choice of the underscore as ``_``
|
|
as a wildcard pattern, i.e. making this one name non-binding. However, the
|
|
underscore is already heavily used as an 'ignore value' marker in iterable
|
|
unpacking. Since the wildcard pattern ``_`` never binds, this use of the
|
|
underscore does not interfere with other uses such as inside the REPL or
|
|
the ``gettext`` module.
|
|
|
|
It has been proposed to use ``...`` (i.e., the ellipsis token) or ``*``
|
|
(star) as a wildcard. However, both these look as if an arbitrary number
|
|
of items is omitted::
|
|
|
|
case [a, ..., z]: ...
|
|
case [a, *, z]: ...
|
|
|
|
Both examples look like the would match a sequence of at two or more items,
|
|
capturing the first and last values.
|
|
|
|
A single wildcard clause (i.e. ``case _:``) is semantically equivalent to
|
|
an ``else:``. It accepts any subject without binding it to a variable or
|
|
performing any other operation. However, the wildcard pattern is in
|
|
contrast to ``else`` usable as a subpattern in nested patterns.
|
|
|
|
Finally note that the underscore is as a wildcard pattern in *every*
|
|
programming language with pattern matching that we could find
|
|
(including *C#*, *Elixir*, *Erlang*, *F#*, *Grace*, *Haskell*,
|
|
*Mathematica*, *OCaml*, *Ruby*, *Rust*, *Scala*, *Swift*, and *Thorn*).
|
|
Keeping in mind that many users of Python also work with other programming
|
|
languages, have prior experience when learning Python, or moving on to
|
|
other languages after having learnt Python, we find that such well
|
|
established standards are important and relevant with respect to
|
|
readability and learnability. In our view, concerns that this wildcard
|
|
means that a regular name received special treatment are not strong
|
|
enough to introduce syntax that would make Python special.
|
|
|
|
Example using the Wildcard pattern::
|
|
|
|
def is_closed(sequence):
|
|
match sequence:
|
|
case [_]: # any sequence with a single element
|
|
return True
|
|
case [start, *_, end]: # a sequence with at least two elements
|
|
return start == end
|
|
case _: # anything
|
|
return False
|
|
|
|
|
|
.. _constant_value_pattern:
|
|
|
|
Value Patterns
|
|
~~~~~~~~~~~~~~
|
|
|
|
It is good programming style to use named constants for parametric values or
|
|
to clarify the meaning of particular values. Clearly, it would be desirable
|
|
to write ``case (HttpStatus.OK, body):`` rather than
|
|
``case (200, body):``, for example. The main issue that arises here is how to
|
|
distinguish capture patterns (variables) from constant value patterns. The
|
|
general discussion surrounding this issue has brought forward a plethora of
|
|
options, which we cannot all fully list here.
|
|
|
|
Strictly speaking, constant value patterns are not really necessary, but
|
|
could be implemented using guards, i.e.
|
|
``case (status, body) if status == HttpStatus.OK:``. Nonetheless, the
|
|
convenience of constant value patterns is unquestioned and obvious.
|
|
|
|
The observation that constants tend to be written in uppercase letters or
|
|
collected in enumeration-like namespaces suggests possible rules to discern
|
|
constants syntactically. However, the idea of using upper vs. lower case as
|
|
a marker has been met with scepticism since there is no similar precedence
|
|
in core Python (although it is common in other languages). We therefore only
|
|
adopted the rule that any dotted name (i.e. attribute access) is to be
|
|
interpreted as a constant value pattern like ``HttpStatus.OK``
|
|
above. This precludes, in particular, local variables from acting as
|
|
constants.
|
|
|
|
Global variables can only be directly used as constant when defined in other
|
|
modules, although there are workarounds to access the current module as a
|
|
namespace as well. A proposed rule to use a leading dot (e.g.
|
|
``.CONSTANT``) for that purpose was critisised because it was felt that the
|
|
dot would not be a visible-enough marker for that purpose. Partly inspired
|
|
by use cases in other programming languages, a number of different
|
|
markers/sigils were proposed (such as ``^CONSTANT``, ``$CONSTANT``,
|
|
``==CONSTANT``, ``CONSTANT?``, or the word enclosed in backticks), although
|
|
there was no obvious or natural choice. The current proposal therefore
|
|
leaves the discussion and possible introduction of such a 'constant' marker
|
|
for future PEPs.
|
|
|
|
Distinguishing the semantics of names based on whether it is a global
|
|
variable (i.e. the compiler would treat global variables as constants rather
|
|
than capture patterns) leads to various issues. The addition or alteration
|
|
of a global variable in the module could have unintended side effects on
|
|
patterns. Moreover, pattern matching could not be used directly inside a
|
|
module's scope because all variables would be global, making capture
|
|
patterns impossible.
|
|
|
|
Example using the Value pattern::
|
|
|
|
def handle_reply(reply):
|
|
match reply:
|
|
case (HttpStatus.OK, MimeType.TEXT, body):
|
|
process_text(body)
|
|
case (HttpStatus.OK, MimeType.APPL_ZIP, body):
|
|
text = deflate(body)
|
|
process_text(text)
|
|
case (HttpStatus.MOVED_PERMANENTLY, new_URI):
|
|
resend_request(new_URI)
|
|
case (HttpStatus.NOT_FOUND):
|
|
raise ResourceNotFound()
|
|
|
|
|
|
Group Patterns
|
|
~~~~~~~~~~~~~~
|
|
|
|
Allowing users to explicitly specify the grouping is particularly helpful
|
|
in case of OR patterns.
|
|
|
|
|
|
.. _sequence_pattern:
|
|
|
|
Sequence Patterns
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
Sequence patterns follow as closely as possible the already established
|
|
syntax and semantics of iterable unpacking. Of course, subpatterns take
|
|
the place of assignment targets (variables, attributes and subscript).
|
|
Moreover, the sequence pattern only matches a carefully selected set of
|
|
possible subjects, whereas iterable unpacking can be applied to any
|
|
iterable.
|
|
|
|
- As in iterable unpacking, we do not distinguish between 'tuple' and
|
|
'list' notation. ``[a, b, c]``, ``(a, b, c)`` and ``a, b, c`` are all
|
|
equivalent. While this means we have a redundant notation and checking
|
|
specifically for lists or tuples requires more effort (e.g.
|
|
``case list([a, b, c])``), we mimick iterable unpacking as much as
|
|
possible.
|
|
|
|
- A starred pattern will capture a sub-sequence of arbitrary length,
|
|
mirroring iterable unpacking as well. Only one starred item may be
|
|
present in any sequence pattern. In theory, patterns such as ``(*_, 3, *_)``
|
|
could be understood as expressing any sequence containing the value ``3``.
|
|
In practise, however, this would only work for a very narrow set of use
|
|
cases and lead to inefficient backtracking or even ambiguities otherwise.
|
|
|
|
- The sequence pattern does *not* iterate through an iterable subject. All
|
|
elements are accessed through subscripting and slicing, and the subject must
|
|
be an instance of ``collections.abc.Sequence`` (including, in particular,
|
|
lists and tuples, but excluding strings and bytes, as well as sets and
|
|
dictionaries).
|
|
|
|
A sequence pattern cannot just iterate through any iterable object. The
|
|
consumption of elements from the iteration would have to be undone if the
|
|
overall pattern fails, which is not possible.
|
|
|
|
Relying on ``len()`` and subscripting and slicing alone does not work to
|
|
identify sequences because sequences share the protocol with more general
|
|
maps (dictionaries) in this regard. It would be surprising if a sequence
|
|
pattern also matched dictionaries or other custom objects that implement
|
|
the mapping protocol (i.e. ``__getitem__``). The interpreter therefore
|
|
performs an instance check to ensure that the subject in question really
|
|
is a sequence (of known type).
|
|
|
|
String and bytes objects have a dual nature: they are both 'atomic' objects
|
|
in their own right, as well as sequences (with a strongly recursive nature
|
|
in that a string is a sequence of strings). The typical behavior and use
|
|
cases for strings and bytes are different enough from that of tuples and
|
|
lists to warrant a clear distinction. It is in fact often unintuitive and
|
|
unintended that strings pass for sequences as evidenced by regular questions
|
|
and complaints. Strings and bytes are therefore not matched by a sequence
|
|
pattern, limiting the sequence pattern to a very specific understanding of
|
|
'sequence'.
|
|
|
|
|
|
.. _mapping_pattern:
|
|
|
|
Mapping Patterns
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Dictionaries or mappings in general are one of the most important and most
|
|
widely used data structures in Python. In contrast to sequences mappings
|
|
are built for fast direct access to arbitrary elements (identified by a key).
|
|
In most use cases an element is retrieved from a dictionary by a known key
|
|
without regard for any ordering or other key-value pairs stored in the same
|
|
dictionary. Particularly common are string keys.
|
|
|
|
The mapping pattern reflects the common usage of dictionary lookup: it allows
|
|
the user to extract some values from a mapping by means of constant/known
|
|
keys and have the values match given subpatterns. Moreover, the mapping
|
|
pattern does not check for the presence of additional keys. Should it be
|
|
necessary to impose an upper bound on the mapping and ensure that no
|
|
additional keys are present, then the usual double-star-pattern ``**rest``
|
|
can be used. The special case ``**_`` with a wildcard, however, is not
|
|
supported as it would not have any effect, but might lead to a wrong
|
|
understanding of the mapping pattern's semantics.
|
|
|
|
To avoid overly expensive matching algorithms, keys must be literals or
|
|
constant values.
|
|
|
|
Example using the Mapping pattern::
|
|
|
|
def change_red_to_blue(json_obj):
|
|
match json_obj:
|
|
case { 'color': ('red' | '#FF0000') }:
|
|
json_obj['color'] = 'blue'
|
|
case { 'children': children }:
|
|
for child in children:
|
|
change_red_to_blue(child)
|
|
|
|
|
|
.. _class_pattern:
|
|
|
|
Class Patterns
|
|
~~~~~~~~~~~~~~
|
|
|
|
Class patterns fulfil two purposes: checking whether a given subject is
|
|
indeed an instance of a specific class and extracting data from specific
|
|
attributes of the subject. A quick survey revealed that ``isinstance()``
|
|
is indeed one of the most often used functions in Python in terms of
|
|
static occurrences in programs. Such instance checks typically precede
|
|
a subsequent access to information stored in the object, or a possible
|
|
manipulation thereof. A typical pattern might be along the lines of::
|
|
|
|
def traverse_tree(node):
|
|
if isinstance(node, Node):
|
|
traverse_tree(node.left)
|
|
traverse_tree(node.right)
|
|
elif isinstance(node, Leaf):
|
|
print(node.value)
|
|
|
|
In many cases, however, class patterns occur nested as in the example
|
|
given in the motivation::
|
|
|
|
if (isinstance(node, BinOp) and node.op == "+"
|
|
and isinstance(node.right, BinOp) and node.right.op == "*"):
|
|
a, b, c = node.left, node.right.left, node.right.right
|
|
# Handle a + b*c
|
|
|
|
The class pattern lets you concisely specify both an instance-check as
|
|
well as relevant attributes (with possible further constraints). It is
|
|
thereby very tempting to write, e.g., ``case Node(left, right):`` in the
|
|
first case above and ``case Leaf(value):`` in the second. While this
|
|
indeed works well for languages with strict algebraic data types, it is
|
|
problematic with the structure of Python objects.
|
|
|
|
When dealing with general Python objects, we face a potentially very large
|
|
number of unordered attributes: an instance of ``Node`` contains a large
|
|
number of attributes (most of which are 'private methods' such as, e.g.,
|
|
``__repr__``). Moreover, the interpreter cannot reliably deduce which of
|
|
the attributes comes first and which comes second. For an object that
|
|
represents a circle, say, there is no inherently obvious ordering of the
|
|
attributes ``x``, ``y`` and ``radius``.
|
|
|
|
We envision two possibilities for dealing with this issue: either explicitly
|
|
name the attributes of interest or provide an additional mapping that tells
|
|
the interpreter which attributes to extract and in which order. Both
|
|
approaches are supported. Moreover, explicitly naming the attributes of
|
|
interest lets you further specify the required structure of an object; if
|
|
an object lacks an attribute specified by the pattern, the match fails.
|
|
|
|
- Attributes that are explicitly named pick up the syntax of named arguments.
|
|
If an object of class ``Node`` has two attributes ``left`` and ``right``
|
|
as above, the pattern ``Node(left=x, right=y)`` will extract the values of
|
|
both attributes and assign them to ``x`` and ``y``, respectively. The data
|
|
flow from left to right seems unusual, but is in line with mapping patterns
|
|
and has precedents such as assignments via ``as`` in *with*- or
|
|
*import*-statements.
|
|
|
|
Naming the attributes in question explicitly will be mostly used for more
|
|
complex cases where the positional form (below) is insufficient.
|
|
|
|
- The class field ``__match_args__`` specifies a number of attributes
|
|
together with their ordering, allowing class patterns to rely on positional
|
|
sub-patterns without having to explicitly name the attributes in question.
|
|
This is particularly handy for smaller objects or instances of data classes,
|
|
where the attributes of interest are rather obvious and often have a
|
|
well-defined ordering. In a way, ``__match_args__`` is similar to the
|
|
declaration of formal parameters, which allows to call functions with
|
|
positional arguments rather than naming all the parameters.
|
|
|
|
|
|
The syntax of class patterns is based on the idea that de-construction
|
|
mirrors the syntax of construction. This is already the case in virtually
|
|
any Python construct, be assignment targets, function definitions or
|
|
iterable unpacking. In all these cases, we find that the syntax for
|
|
sending and that for receiving 'data' are virtually identical.
|
|
|
|
- Assignment targets such as variables, attributes and subscripts:
|
|
``foo.bar[2] = foo.bar[3]``;
|
|
|
|
- Function definitions: a function defined with ``def foo(x, y, z=6)``
|
|
is called as, e.g., ``foo(123, y=45)``, where the actual arguments
|
|
provided at the call site are matched against the formal parameters
|
|
at the definition site;
|
|
|
|
- Iterable unpacking: ``a, b = b, a`` or ``[a, b] = [b, a]`` or
|
|
``(a, b) = (b, a)``, just to name a few equivalent possibilities.
|
|
|
|
Using the same syntax for reading and writing, l- and r-values, or
|
|
construction and de-construction is widely accepted for its benefits in
|
|
thinking about data, its flow and manipulation. This equally extends to
|
|
the explicit construction of instances, where class patterns ``c(p, q)``
|
|
deliberately mirror the syntax of creating instances.
|
|
|
|
|
|
|
|
History and Context
|
|
===================
|
|
|
|
Pattern matching emerged in the late 1970s in the form of tuple unpacking
|
|
and as a means to handle recursive data structures such as linked lists or
|
|
trees (object-oriented languages usually use the visitor pattern for handling
|
|
recursive data structures). The early proponents of pattern matching
|
|
organised structured data in 'tagged tuples' rather than ``struct`` as in
|
|
*C* or the objects introduced later. A node in a binary tree would, for
|
|
instance, be a tuple with two elements for the left and right branches,
|
|
respectively, and a ``Node`` tag, written as ``Node(left, right)``. In
|
|
Python we would probably put the tag inside the tuple as
|
|
``('Node', left, right)`` or define a data class `Node` to achieve the
|
|
same effect.
|
|
|
|
Using modern syntax, a depth-first tree traversal would then be written as
|
|
follows::
|
|
|
|
def traverse_tree(node):
|
|
node match:
|
|
case Node(left, right):
|
|
DFS(left)
|
|
DFS(right)
|
|
case Leaf(value):
|
|
handle(value)
|
|
|
|
The notion of handling recursive data structures with pattern matching
|
|
immediately gave rise to the idea of handling more general recursive
|
|
'patterns' (i.e. recursion beyond recursive data structures)
|
|
with pattern matching. Pattern matching would thus also be used to define
|
|
recursive functions such as::
|
|
|
|
def fib(arg):
|
|
match arg:
|
|
case 0:
|
|
return 1
|
|
case 1:
|
|
return 1
|
|
case n:
|
|
return fib(n-1) + fib(n-2)
|
|
|
|
As pattern matching was repeatedly integrated into new and emerging
|
|
programming languages, its syntax slightly evolved and expanded. The two
|
|
first cases in the ``fib`` example above could be written more succinctly
|
|
as ``case 0 | 1:`` with ``|`` denoting alternative patterns. Moreover, the
|
|
underscore ``_`` was widely adopted as a wildcard, a filler where neither
|
|
the structure nor value of parts of a pattern were of substance. Since the
|
|
underscore is already frequently used in equivalent capacity in Python's
|
|
iterable unpacking (e.g., ``_, _, third, _* = something``) we kept these
|
|
universal standards.
|
|
|
|
It is noteworthy that the concept of pattern matching has always been
|
|
closely linked to the concept of functions. The different case clauses
|
|
have always been considered as something like semi-indepedent functions
|
|
where pattern variables take on the role of parameters. This becomes
|
|
most apparent when pattern matching is written as an overloaded function,
|
|
along the lines of (Standard ML)::
|
|
|
|
fun fib 0 = 1
|
|
| fib 1 = 1
|
|
| fib n = fib (n-1) + fib (n-2)
|
|
|
|
Even though such a strict separation of case clauses into independent
|
|
functions does not make sense in Python, we find that patterns share many
|
|
syntactic rules with parameters, such as binding arguments to unqualified
|
|
names only or that variable/parameter names must not be repeated for
|
|
a particular pattern/function.
|
|
|
|
With its emphasis on abstraction and encapsulation, object-oriented
|
|
programming posed a serious challenge to pattern matching. In short: in
|
|
object-oriented programming, we can no longer view objects as tagged tuples.
|
|
The arguments passed into the constructor do not necessarily specify the
|
|
attributes or fields of the objects. Moreover, there is no longer a strict
|
|
ordering of an object's fields and some of the fields might be private and
|
|
thus inaccessible. And on top of this, the given object might actually be
|
|
an instance of a subclass with slightly different structure.
|
|
|
|
To address this challenge, patterns became increasingly independent of the
|
|
original tuple constructors. In a pattern like ``Node(left, right)``,
|
|
``Node`` is no longer a passive tag, but rather a function that can actively
|
|
check for any given object whether it has the right structure and extract a
|
|
``left`` and ``right`` field. In other words: the ``Node``-tag becomes a
|
|
function that transforms an object into a tuple or returns some failure
|
|
indicator if it is not possible.
|
|
|
|
In Python, we simply use ``isinstance()`` together with the ``__match_args__``
|
|
field of a class to check whether an object has the correct structure and
|
|
then transform some of its attributes into a tuple. For the `Node` example
|
|
above, for instance, we would have ``__match_args__ = ('left', 'right')`` to
|
|
indicate that these two attributes should be extracted to form the tuple.
|
|
That is, ``case Node(x, y)`` would first check whether a given object is an
|
|
instance of ``Node`` and then assign ``left`` to ``x`` and ``right`` to ``y``,
|
|
respectively.
|
|
|
|
Paying tribute to Python's dynamic nature with 'duck typing', however, we
|
|
also added a more direct way to specify the presence of, or constraints on
|
|
specific attributes. Instead of ``Node(x, y)`` you could also write
|
|
``object(left=x, right=y)``, effectively eliminating the ``isinstance()``
|
|
check and thus supporting any object with ``left`` and ``right`` attributes.
|
|
Or you would combine these ideas to write ``Node(right=y)`` so as to require
|
|
an instance of ``Node`` but only extract the value of the `right` attribute.
|
|
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|