1319 lines
58 KiB
ReStructuredText
1319 lines
58 KiB
ReStructuredText
PEP: 635
|
|
Title: Structural Pattern Matching: Motivation and Rationale
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Tobias Kohn <kohnt@tobiaskohn.ch>,
|
|
Guido van Rossum <guido@python.org>
|
|
BDFL-Delegate:
|
|
Discussions-To: Python-Dev <python-dev@python.org>
|
|
Status: Draft
|
|
Type: Informational
|
|
Content-Type: text/x-rst
|
|
Created: 12-Sep-2020
|
|
Python-Version: 3.10
|
|
Post-History: 22-Oct-2020
|
|
Resolution:
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP provides the motivation and rationale for PEP 634
|
|
("Structural Pattern Matching: Specification"). First-time readers
|
|
are encouraged to start with PEP 636, which provides a gentler
|
|
introduction to the concepts, syntax and semantics of patterns.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
(Structural) pattern matching syntax is found in many languages, from
|
|
Haskell, Erlang and Scala to Elixir and Ruby. (A proposal for
|
|
JavaScript is also under consideration.)
|
|
|
|
Python already supports a limited form of this through sequence
|
|
unpacking assignments, which the new proposal leverages.
|
|
|
|
Several other common Python idioms are also relevant:
|
|
|
|
- The ``if ... elif ... elif ... else`` idiom is often used to find
|
|
out the type or shape of an object in an ad-hoc fashion, using one
|
|
or more checks like ``isinstance(x, cls)``, ``hasattr(x, "attr")``,
|
|
``len(x) == n`` or ``"key" in x`` as guards to select an applicable
|
|
block. The block can then assume ``x`` supports the interface
|
|
checked by the guard. For example::
|
|
|
|
if isinstance(x, tuple) and len(x) == 2:
|
|
host, port = x
|
|
mode = "http"
|
|
elif isinstance(x, tuple) and len(x) == 3:
|
|
host, port, mode = x
|
|
# Etc.
|
|
|
|
Code like this is more elegantly rendered using ``match``::
|
|
|
|
match x:
|
|
case host, port:
|
|
mode = "http"
|
|
case host, port, mode:
|
|
pass
|
|
# Etc.
|
|
|
|
- AST traversal code often looks for nodes matching a given pattern,
|
|
for example the code to detect a node of the shape "A + B * C" might
|
|
look like this::
|
|
|
|
if (isinstance(node, BinOp) and node.op == "+"
|
|
and isinstance(node.right, BinOp) and node.right.op == "*"):
|
|
a, b, c = node.left, node.right.left, node.right.right
|
|
# Handle a + b*c
|
|
|
|
Using ``match`` this becomes more readable::
|
|
|
|
match node:
|
|
case BinOp("+", a, BinOp("*", b, c)):
|
|
# Handle a + b*c
|
|
|
|
We believe that adding pattern matching to Python will enable Python
|
|
users to write cleaner, more readable code for examples like those
|
|
above, and many others.
|
|
|
|
For a more academic discussion to this proposal, see [1]_.
|
|
|
|
|
|
Pattern Matching and OO
|
|
-----------------------
|
|
|
|
Pattern matching is complimentary to the object-oriented paradigm.
|
|
Using OO and inheritance we can easily define a method on a base class
|
|
that defines default behavior for a specific operation on that class,
|
|
and we can override this default behavior in subclasses. We can also
|
|
use the Visitor pattern to separate actions from data.
|
|
|
|
But this is not sufficient for all situations. For example, a code
|
|
generator may consume an AST, and have many operations where the
|
|
generated code needs to vary based not just on the class of a node,
|
|
but also on the value of some class attributes, like the ``BinOp``
|
|
example above. The Visitor pattern is insufficiently flexible for
|
|
this: it can only select based on the class.
|
|
|
|
For a complete example, see
|
|
https://github.com/gvanrossum/patma/blob/master/examples/expr.py#L231
|
|
|
|
Like the Visitor pattern, pattern matching allows for a strict separation
|
|
of concerns: specific actions or data processing is independent of the
|
|
class hierarchy or manipulated objects. When dealing with predefined or
|
|
even built-in classes, in particular, it is often impossible to add further
|
|
methods to the individual classes. Pattern matching not only relieves the
|
|
programmer or class designer from the burden of the boilerplate code needed
|
|
for the Visitor pattern, but is also flexible enough to directly work with
|
|
built-in types. It naturally distinguishes between sequences of different
|
|
lengths, which might all share the same class despite obviously differing
|
|
structures. Moreover, pattern matching automatically takes inheritance
|
|
into account: a class *D* inheriting from *C* will be handled by a pattern
|
|
that targets *C* by default.
|
|
|
|
Object oriented programming is geared towards single-dispatch: it is a
|
|
single instance (or the type thereof) that determines which method is to
|
|
be called. This leads to a somewhat artificial situation in case of binary
|
|
operators where both objects might play an equal role in deciding which
|
|
implementation to use (Python addresses this through the use of reversed
|
|
binary methods). Pattern matching is structurally better suited to handle
|
|
such situations of multi-dispatch, where the action to be taken depends on
|
|
the types of several objects to equal parts.
|
|
|
|
|
|
Patterns and Functional Style
|
|
-----------------------------
|
|
|
|
Many Python applications and libraries are not written in a consistent
|
|
OO style -- unlike Java, Python encourages defining functions at the
|
|
top-level of a module, and for simple data structures, tuples (or
|
|
named tuples or lists) and dictionaries are often used exclusively or
|
|
mixed with classes or data classes.
|
|
|
|
Pattern matching is particularly suitable for picking apart such data
|
|
structures. As an extreme example, it's easy to write code that picks
|
|
a JSON data structure using ``match``::
|
|
|
|
match json_pet:
|
|
case {"type": "cat", "name": name, "pattern": pattern}:
|
|
return Cat(name, pattern)
|
|
case {"type": "dog", "name": name, "breed": breed}:
|
|
return Dog(name, breed)
|
|
case _:
|
|
raise ValueError("Not a suitable pet")
|
|
|
|
Functional programming generally prefers a declarative style with a focus
|
|
on relationships in data. Side effects are avoided whenever possible.
|
|
Pattern matching thus naturally fits and highly supports functional
|
|
programming style.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
This section provides the rationale for individual design decisions.
|
|
It takes the place of "Rejected ideas" in the standard PEP format.
|
|
It is organized in sections corresponding to the specification (PEP 634).
|
|
|
|
|
|
Overview and Terminology
|
|
------------------------
|
|
|
|
Much of the power of pattern matching comes from the nesting of subpatterns.
|
|
That the success of a pattern match depends directly on the success of
|
|
subpattern is thus a cornerstone of the design. However, although a
|
|
pattern like ``P(Q(), R())`` succeeds only if both subpatterns ``Q()``
|
|
and ``R()`` succeed (i.e. the success of pattern ``P`` depends on ``Q``
|
|
and ``R``), the pattern ``P`` is checked first. If ``P`` fails, neither
|
|
``Q()`` nor ``R()`` will be tried (this is a direct consequence of the
|
|
fact that if ``P`` fails, there are no subjects to match against ``Q()``
|
|
and ``R()`` in the first place).
|
|
|
|
Also note that patterns bind names to values rather than performing an
|
|
assignment. This reflects the fact that patterns aim to not have side
|
|
effects, which also means that Capture or AS patterns cannot assign a
|
|
value to an attribute or subscript. We thus consistently use the term
|
|
'bind' instead of 'assign' to emphasise this subtle difference between
|
|
traditional assignments and name binding in patterns.
|
|
|
|
|
|
The Match Statement
|
|
-------------------
|
|
|
|
The match statement evaluates an expression to produce a subject, finds the
|
|
first pattern that matches the subject, and executes the associated block
|
|
of code. Syntactically, the match statement thus takes an expression and
|
|
a sequence of case clauses, where each case clause comprises a pattern and
|
|
a block of code.
|
|
|
|
Since case clauses comprise a block of code, they adhere to the existing
|
|
indentation scheme with the syntactic structure of
|
|
``<keyword> ...: <(indented) block>``, which resembles a compound
|
|
statement. The keyword ``case`` reflects its widespread use in
|
|
pattern matching languages, ignoring those languages that use other
|
|
syntactic means such as a symbol like ``|``, because it would not fit
|
|
established Python structures. The syntax of patterns following the
|
|
keyword is discussed below.
|
|
|
|
Given that the case clauses follow the structure of a compound statement,
|
|
the match statement itself naturally becomes a compound statement itself
|
|
as well, following the same syntactic structure. This naturally leads to
|
|
``match <expr>: <case_clause>+``. Note that the match statement determines
|
|
a quasi-scope in which the evaluated subject is kept alive (although not in
|
|
a local variable), similar to how a with statement might keep a resource
|
|
alive during execution of its block. Furthermore, control flows from the
|
|
match statement to a case clause and then leaves the block of the match
|
|
statement. The block of the match statement thus has both syntactic and
|
|
semantic meaning.
|
|
|
|
Various suggestions have sought to eliminate or avoid the naturally arising
|
|
"double indentation" of a case clause's code block. Unfortunately, all such
|
|
proposals of *flat indentation schemes* come at the expense of violating
|
|
Python's established structural paradigm, leading to additional syntactic
|
|
rules:
|
|
|
|
- *Unindented case clauses.*
|
|
The idea is to align case clauses with the ``match``, i.e.::
|
|
|
|
match expression:
|
|
case pattern_1:
|
|
...
|
|
case pattern_2:
|
|
...
|
|
|
|
This may look awkward to the eye of a Python programmer, because
|
|
everywhere else a colon is followed by an indent. The ``match`` would
|
|
neither follow the syntactic scheme of simple nor composite statements
|
|
but rather establish a category of its own.
|
|
|
|
- *Putting the expression on a separate line after "match".*
|
|
The idea is to use the expression yielding the subject as a statement
|
|
to avoid the singularity of ``match`` having no actual block despite
|
|
the colons::
|
|
|
|
match:
|
|
expression
|
|
case pattern_1:
|
|
...
|
|
case pattern_2:
|
|
...
|
|
|
|
This was ultimately rejected because the first block would be another
|
|
novelty in Python's grammar: a block whose only content is a single
|
|
expression rather than a sequence of statements. Attempts to amend this
|
|
issue by adding or repurposing yet another keyword along the lines of
|
|
``match: return expression`` did not yield any satisfactory solution.
|
|
|
|
Although flat indentation would save some horizontal space, the cost of
|
|
increased complexity or unusual rules is too high. It would also complicate
|
|
life for simple-minded code editors. Finally, the horizontal space issue can
|
|
be alleviated by allowing "half-indent" (i.e. two spaces instead of four)
|
|
for match statements (though we do not recommend this).
|
|
|
|
In sample programs using ``match``, written as part of the development of this
|
|
PEP, a noticeable improvement in code brevity is observed, more than making
|
|
up for the additional indentation level.
|
|
|
|
|
|
*Statement vs. Expression.* Some suggestions centered around the idea of
|
|
making ``match`` an expression rather than a statement. However, this
|
|
would fit poorly with Python's statement-oriented nature and lead to
|
|
unusually long and complex expressions and the need to invent new
|
|
syntactic constructs or break well established syntactic rules. An
|
|
obvious consequence of ``match`` as an expression would be that case
|
|
clauses could no longer have arbitrary blocks of code attached, but only
|
|
a single expression. Overall, the strong limitations could in no way
|
|
offset the slight simplification in some special use cases.
|
|
|
|
|
|
*Hard vs. Soft Keyword.* There were options to make match a hard keyword,
|
|
or choose a different keyword. Although using a hard keyword would simplify
|
|
life for simple-minded syntax highlighters, we decided not to use hard
|
|
keyword for several reasons:
|
|
|
|
- Most importantly, the new parser doesn't require us to do this. Unlike
|
|
with ``async`` that caused hardships with being a soft keyword for few
|
|
releases, here we can make ``match`` a permanent soft keyword.
|
|
|
|
- ``match`` is so commonly used in existing code, that it would break
|
|
almost every existing program and will put a burden to fix code on many
|
|
people who may not even benefit from the new syntax.
|
|
|
|
- It is hard to find an alternative keyword that would not be commonly used
|
|
in existing programs as an identifier, and would still clearly reflect the
|
|
meaning of the statement.
|
|
|
|
|
|
**Use "as" or "|" instead of "case" for case clauses.**
|
|
The pattern matching proposed here is a combination of multi-branch control
|
|
flow (in line with ``switch`` in Algol-derived languages or ``cond`` in Lisp)
|
|
and object-deconstruction as found in functional languages. While the proposed
|
|
keyword ``case`` highlights the multi-branch aspect, alternative keywords such
|
|
as ``as`` would equally be possible, highlighting the deconstruction aspect.
|
|
``as`` or ``with``, for instance, also have the advantage of already being
|
|
keywords in Python. However, since ``case`` as a keyword can only occur as a
|
|
leading keyword inside a ``match`` statement, it is easy for a parser to
|
|
distinguish between its use as a keyword or as a variable.
|
|
|
|
Other variants would use a symbol like ``|`` or ``=>``, or go entirely without
|
|
special marker.
|
|
|
|
Since Python is a statement-oriented language in the tradition of Algol, and as
|
|
each composite statement starts with an identifying keyword, ``case`` seemed to
|
|
be most in line with Python's style and traditions.
|
|
|
|
|
|
Match Semantics
|
|
~~~~~~~~~~~~~~~
|
|
|
|
The patterns of different case clauses might overlap in that more than
|
|
one case clause would match a given subject. The first-to-match rule
|
|
ensures that the selection of a case clause for a given subject is
|
|
unambiguous. Furthermore, case clauses can have increasingly general
|
|
patterns matching wider sets of subjects. The first-to-match rule
|
|
then ensures that the most precise pattern can be chosen (although it
|
|
is the programmer's responsibility to order the case clauses correctly).
|
|
|
|
In a statically typed language, the match statement would be compiled to
|
|
a decision tree to select a matching pattern quickly and very efficiently.
|
|
This would, however, require that all patterns be purely declarative and
|
|
static, running against the established dynamic semantics of Python. The
|
|
proposed semantics thus represent a path incorporating the best of both
|
|
worlds: patterns are tried in a strictly sequential order so that each
|
|
case clause constitutes an actual statement. At the same time, we allow
|
|
the interpreter to cache any information about the subject or change the
|
|
order in which subpatterns are tried. In other words: if the interpreter
|
|
has found that the subject is not an instance of a class ``C``, it can
|
|
directly skip case clauses testing for this again, without having to
|
|
perform repeated instance-checks. If a guard stipulates that a variable
|
|
``x`` must be positive, say (i.e. ``if x > 0``), the interpreter might
|
|
check this directly after binding ``x`` and before any further
|
|
subpatterns are considered.
|
|
|
|
|
|
*Binding and scoping.* In many pattern matching implementations, each
|
|
case clause would establish a separate scope of its own. Variables bound
|
|
by a pattern would then only be visible inside the corresponding case block.
|
|
In Python, however, this does not make sense. Establishing separate scopes
|
|
would essentially mean that each case clause is a separate function without
|
|
direct access to the variables in the surrounding scope (without having to
|
|
resort to ``nonlocal`` that is). Moreover, a case clause could no longer
|
|
influence any surrounding control flow through standard statement such as
|
|
``return`` or ``break``. Hence, such strict scoping would lead to
|
|
unintuitive and surprising behavior.
|
|
|
|
A direct consequence of this is that any variable bindings outlive the
|
|
respective case or match statements. Even patterns that only match a
|
|
subject partially might bind local variables (this is, in fact, necessary
|
|
for guards to function properly). However, these semantics for variable
|
|
binding are in line with existing Python structures such as for loops and
|
|
with statements.
|
|
|
|
|
|
Guards
|
|
~~~~~~
|
|
|
|
Some constraints cannot be adequately expressed through patterns alone.
|
|
For instance, a 'less' or 'greater than' relationship defies the usual
|
|
'equal' semantics of patterns. Moreover, different subpatterns are
|
|
independent and cannot refer to each other. The addition of *guards*
|
|
addresses these restrictions: a guard is an arbitrary expression attached
|
|
to a pattern and that must evaluate to a "truthy" value for the pattern to succeed.
|
|
|
|
For example, ``case [x, y] if x < y:`` uses a guard (``if x < y``) to
|
|
express a 'less than' relationship between two otherwise disjoint capture
|
|
patterns ``x`` and ``y``.
|
|
|
|
From a conceptual point of view, patterns describe structural constraints
|
|
on the subject in a declarative style, ideally without any side-effects.
|
|
Recall, in particular, that patterns are clearly distinct from expressions,
|
|
following different objectives and semantics. Guards then enhance case
|
|
blocks in a highly controlled way with arbitrary expressions (that might
|
|
have side effects). Splitting the overall functionality into a static structural
|
|
and a dynamically evaluated part not only helps with readability, but can
|
|
also introduce dramatic potential for compiler optimizations. To keep this
|
|
clear separation, guards are only supported on the level of case clauses
|
|
and not for individual patterns.
|
|
|
|
**Example** using guards::
|
|
|
|
def sort(seq):
|
|
match seq:
|
|
case [] | [_]:
|
|
return seq
|
|
case [x, y] if x <= y:
|
|
return seq
|
|
case [x, y]:
|
|
return [y, x]
|
|
case [x, y, z] if x <= y <= z:
|
|
return seq
|
|
case [x, y, z] if x >= y >= z:
|
|
return [z, y, x]
|
|
case [p, *rest]:
|
|
a = sort([x for x in rest if x <= p])
|
|
b = sort([x for x in rest if p < x])
|
|
return a + [p] + b
|
|
|
|
|
|
.. _patterns:
|
|
|
|
Patterns
|
|
--------
|
|
|
|
Patterns fulfill two purposes: they impose (structural) constraints on
|
|
the subject and they specify which data values should be extracted from
|
|
the subject and bound to variables. In iterable unpacking, which can be
|
|
seen as a prototype to pattern matching in Python, there is only one
|
|
*structural pattern* to express sequences while there is a rich set of
|
|
*binding patterns* to assign a value to a specific variable or field.
|
|
Full pattern matching differs from this in that there is more variety
|
|
in structural patterns but only a minimum of binding patterns.
|
|
|
|
Patterns differ from assignment targets (as in iterable unpacking) in two ways:
|
|
they impose additional constraints on the structure of the subject, and
|
|
a subject may safely fail to match a specific pattern at any point
|
|
(in iterable unpacking, this constitutes an error). The latter means that
|
|
pattern should avoid side effects wherever possible.
|
|
|
|
This desire to avoid side effects is one reason why capture patterns
|
|
don't allow binding values to attributes or subscripts: if the
|
|
containing pattern were to fail in a later step, it would be hard to
|
|
revert such bindings.
|
|
|
|
A cornerstone of pattern matching is the possibility of arbitrarily
|
|
*nesting patterns*. The nesting allows expressing deep
|
|
tree structures (for an example of nested class patterns, see the motivation
|
|
section above) as well as alternatives.
|
|
|
|
Although patterns might superficially look like expressions,
|
|
it is important to keep in mind that there is a clear distinction. In fact,
|
|
no pattern is or contains an expression. It is more productive to think of
|
|
patterns as declarative elements similar to the formal parameters in a
|
|
function definition.
|
|
|
|
|
|
AS Patterns
|
|
~~~~~~~~~~~
|
|
|
|
Patterns fall into two categories: most patterns impose a (structural)
|
|
constraint that the subject needs to fulfill, whereas the capture pattern
|
|
binds the subject to a name without regard for the subject's structure or
|
|
actual value. Consequently, a pattern can either express a constraint or
|
|
bind a value, but not both. AS patterns fill this gap in that they
|
|
allow the user to specify a general pattern as well as capture the subject
|
|
in a variable.
|
|
|
|
Typical use cases for the AS pattern include OR and Class patterns
|
|
together with a binding name as in, e.g., ``case BinOp('+'|'-' as op, ...):``
|
|
or ``case [int() as first, int() as second]:``. The latter could be
|
|
understood as saying that the subject must fulfil two distinct pattern:
|
|
``[first, second]`` as well as ``[int(), int()]``. The AS pattern
|
|
can thus be seen as a special case of an 'and' pattern (see OR patterns
|
|
below for an additional discussion of 'and' patterns).
|
|
|
|
In an earlier version, the AS pattern was devised as a 'Walrus pattern',
|
|
written as ``case [first:=int(), second:=int()]``. However, using ``as``
|
|
offers some advantages over ``:=``:
|
|
|
|
- The walrus operator ``:=`` is used to capture the result of an expression
|
|
on the right hand side, whereas ``as`` generally indicates some form of
|
|
'processing' as in ``import foo as bar`` or ``except E as err:``. Indeed,
|
|
the pattern ``P as x`` does not assign the pattern ``P`` to ``x``, but
|
|
rather the subject that successfully matches ``P``.
|
|
|
|
- ``as`` allows for a more consistent data flow from left to right (the
|
|
attributes in Class patterns also follow a left-to-right data flow).
|
|
|
|
- The walrus operator looks very similar to the syntax for matching attributes in the Class pattern,
|
|
potentially leading to some confusion.
|
|
|
|
**Example** using the AS pattern::
|
|
|
|
def simplify_expr(tokens):
|
|
match tokens:
|
|
case [('('|'[') as l, *expr, (')'|']') as r] if (l+r) in ('()', '[]'):
|
|
return simplify_expr(expr)
|
|
case [0, ('+'|'-') as op, right]:
|
|
return UnaryOp(op, right)
|
|
case [(int() | float() as left) | Num(left), '+', (int() | float() as right) | Num(right)]:
|
|
return Num(left + right)
|
|
case [(int() | float()) as value]:
|
|
return Num(value)
|
|
|
|
|
|
OR Patterns
|
|
~~~~~~~~~~~
|
|
|
|
The OR pattern allows you to combine 'structurally equivalent' alternatives
|
|
into a new pattern, i.e. several patterns can share a common handler. If any
|
|
of an OR pattern's subpatterns matches the subject, the entire OR
|
|
pattern succeeds.
|
|
|
|
Statically typed languages prohibit the binding of names (capture patterns)
|
|
inside an OR pattern because of potential conflicts concerning the types of
|
|
variables. As a dynamically typed language, Python can be less restrictive
|
|
here and allow capture patterns inside OR patterns. However, each subpattern
|
|
must bind the same set of variables so as not to leave potentially undefined
|
|
names. With two alternatives ``P | Q``, this means that if *P* binds the
|
|
variables *u* and *v*, *Q* must bind exactly the same variables *u* and *v*.
|
|
|
|
There was some discussion on whether to use the bar symbol ``|`` or the ``or``
|
|
keyword to separate alternatives. The OR pattern does not fully fit
|
|
the existing semantics and usage of either of these two symbols. However,
|
|
``|`` is the symbol of choice in all programming languages with support of
|
|
the OR pattern and is used in that capacity for regular expressions in
|
|
Python as well. It is also the traditional separator between alternatives
|
|
in formal grammars (including Python's).
|
|
Moreover, ``|`` is not only used for bitwise OR, but also
|
|
for set unions and dict merging (:pep:`584`).
|
|
|
|
Other alternatives were considered as well, but none of these would allow
|
|
OR-patterns to be nested inside other patterns:
|
|
|
|
- *Using a comma*::
|
|
|
|
case 401, 403, 404:
|
|
print("Some HTTP error")
|
|
|
|
This looks too much like a tuple -- we would have to find a different way
|
|
to spell tuples, and the construct would have to be parenthesized inside
|
|
the argument list of a class pattern. In general, commas already have many
|
|
different meanings in Python, we shouldn't add more.
|
|
|
|
- *Using stacked cases*::
|
|
|
|
case 401:
|
|
case 403:
|
|
case 404:
|
|
print("Some HTTP error")
|
|
|
|
This is how this would be done in *C*, using its fall-through semantics
|
|
for cases. However, we don't want to mislead people into thinking that
|
|
match/case uses fall-through semantics (which are a common source of bugs
|
|
in *C*). Also, this would be a novel indentation pattern, which might make
|
|
it harder to support in IDEs and such (it would break the simple rule "add
|
|
an indentation level after a line ending in a colon"). Finally, this
|
|
would not support OR patterns nested inside other patterns, either.
|
|
|
|
- *Using "case in" followed by a comma-separated list*::
|
|
|
|
case in 401, 403, 404:
|
|
print("Some HTTP error")
|
|
|
|
This would not work for OR patterns nested inside other patterns, like::
|
|
|
|
case Point(0|1, 0|1):
|
|
print("A corner of the unit square")
|
|
|
|
|
|
**AND and NOT Patterns**
|
|
|
|
Since this proposal defines an OR-pattern (``|``) to match one of several alternates,
|
|
why not also an AND-pattern (``&``) or even a NOT-pattern (``!``)?
|
|
Especially given that some other languages (``F#`` for example) support
|
|
AND-patterns.
|
|
|
|
However, it is not clear how useful this would be. The semantics for matching
|
|
dictionaries, objects and sequences already incorporates an implicit 'and':
|
|
all attributes and elements mentioned must be present for the match to
|
|
succeed. Guard conditions can also support many of the use cases that a
|
|
hypothetical 'and' operator would be used for.
|
|
|
|
A negation of a match pattern using the operator ``!`` as a prefix
|
|
would match exactly if the pattern itself does not match. For
|
|
instance, ``!(3 | 4)`` would match anything except ``3`` or ``4``.
|
|
However, there is `evidence from other languages
|
|
<https://dl.acm.org/doi/abs/10.1145/2480360.2384582>`_ that this is
|
|
rarely useful, and primarily used as double negation ``!!`` to control
|
|
variable scopes and prevent variable bindings (which does not apply to
|
|
Python). Other use cases are better expressed using guards.
|
|
|
|
In the end, it was decided that this would make the syntax more complex
|
|
without adding a significant benefit. It can always be added later.
|
|
|
|
**Example** using the OR pattern::
|
|
|
|
def simplify(expr):
|
|
match expr:
|
|
case ('/', 0, 0):
|
|
return expr
|
|
case ('*'|'/', 0, _):
|
|
return 0
|
|
case ('+'|'-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*'|'/', x, 1):
|
|
return x
|
|
return expr
|
|
|
|
|
|
.. _literal_pattern:
|
|
|
|
Literal Patterns
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Literal patterns are a convenient way for imposing constraints on the
|
|
value of a subject, rather than its type or structure. They also
|
|
allow you to emulate a switch statement using pattern matching.
|
|
|
|
Generally, the subject is compared to a literal pattern by means of standard
|
|
equality (``x == y`` in Python syntax). Consequently, the literal patterns
|
|
``1.0`` and ``1`` match exactly the same set of objects, i.e. ``case 1.0:``
|
|
and ``case 1:`` are fully interchangeable. In principle, ``True`` would also
|
|
match the same set of objects because ``True == 1`` holds. However, we
|
|
believe that many users would be surprised finding that ``case True:``
|
|
matched the subject ``1.0``, resulting in some subtle bugs and convoluted
|
|
workarounds. We therefore adopted the rule that the three singleton
|
|
patterns ``None``, ``False`` and ``True`` match by identity (``x is y`` in
|
|
Python syntax) rather than equality. Hence, ``case True:`` will match only
|
|
``True`` and nothing else. Note that ``case 1:`` would still match ``True``,
|
|
though, because the literal pattern ``1`` works by equality and not identity.
|
|
|
|
Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would
|
|
match both the integer ``1`` and the floating point number ``1.0``, whereas
|
|
``case 1:`` would only match the integer ``1`` were eventually dropped in
|
|
favor of the simpler and more consistent rule based on equality. Moreover, any
|
|
additional checks whether the subject is an instance of ``numbers.Integral``
|
|
would come at a high runtime cost to introduce what would essentially be
|
|
a novel idea in Python. When needed, the explicit syntax ``case int(1):`` can
|
|
be used.
|
|
|
|
Recall that literal patterns are *not* expressions, but directly
|
|
denote a specific value. From a pragmatic point of view, we want to
|
|
allow using negative and even complex values as literal patterns, but
|
|
they are not atomic literals (only unsigned real and imaginary numbers
|
|
are). E.g., ``-3+4j`` is syntactically an expression of the form
|
|
``BinOp(UnaryOp('-', 3), '+', 4j)``. Since expressions are not part
|
|
of patterns, we had to add explicit syntactic support for such values
|
|
without having to resort to full expressions.
|
|
|
|
Interpolated *f*-strings, on the
|
|
other hand, are not literal values, despite their appearance and can
|
|
therefore not be used as literal patterns (string concatenation, however,
|
|
is supported).
|
|
|
|
Literal patterns not only occur as patterns in their own right, but also
|
|
as keys in *mapping patterns*.
|
|
|
|
|
|
**Range matching patterns.**
|
|
This would allow patterns such as ``1...6``. However, there are a host of
|
|
ambiguities:
|
|
|
|
* Is the range open, half-open, or closed? (I.e. is ``6`` included in the
|
|
above example or not?)
|
|
* Does the range match a single number, or a range object?
|
|
* Range matching is often used for character ranges ('a'...'z') but that
|
|
won't work in Python since there's no character data type, just strings.
|
|
* Range matching can be a significant performance optimization if you can
|
|
pre-build a jump table, but that's not generally possible in Python due
|
|
to the fact that names can be dynamically rebound.
|
|
|
|
Rather than creating a special-case syntax for ranges, it was decided
|
|
that allowing custom pattern objects (``InRange(0, 6)``) would be more flexible
|
|
and less ambiguous; however those ideas have been postponed for the time
|
|
being.
|
|
|
|
|
|
**Example** using Literal patterns::
|
|
|
|
def simplify(expr):
|
|
match expr:
|
|
case ('+', 0, x):
|
|
return x
|
|
case ('+' | '-', x, 0):
|
|
return x
|
|
case ('and', True, x):
|
|
return x
|
|
case ('and', False, x):
|
|
return False
|
|
case ('or', False, x):
|
|
return x
|
|
case ('or', True, x):
|
|
return True
|
|
case ('not', ('not', x)):
|
|
return x
|
|
return expr
|
|
|
|
|
|
.. _capture_pattern:
|
|
|
|
Capture Patterns
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Capture patterns take on the form of a name that accepts any value and binds
|
|
it to a (local) variable (unless the name is declared as ``nonlocal`` or
|
|
``global``). In that sense, a capture pattern is similar
|
|
to a parameter in a function definition (when the function is called, each
|
|
parameter binds the respective argument to a local variable in the function's
|
|
scope).
|
|
|
|
A name used for a capture pattern must not coincide with another capture
|
|
pattern in the same pattern. This, again, is similar to parameters, which
|
|
equally require each parameter name to be unique within the list of
|
|
parameters. It differs, however, from iterable unpacking assignment, where
|
|
the repeated use of a variable name as target is permissible (e.g.,
|
|
``x, x = 1, 2``). The rationale for not supporting ``(x, x)`` in patterns
|
|
is its ambiguous reading: it could be seen as in iterable unpacking where
|
|
only the second binding to ``x`` survives. But it could be equally seen as
|
|
expressing a tuple with two equal elements (which comes with its own issues).
|
|
Should the need arise, then it is still possible to introduce support for
|
|
repeated use of names later on.
|
|
|
|
There were calls to explicitly mark capture patterns and thus identify them
|
|
as binding targets. According to that idea, a capture pattern would be
|
|
written as, e.g. ``?x``, ``$x`` or ``=x``. The aim of such explicit capture
|
|
markers is to let an unmarked name be a value pattern (see below).
|
|
However, this is based on the misconception that pattern matching was an
|
|
extension of *switch* statements, placing the emphasis on fast switching based
|
|
on (ordinal) values. Such a *switch* statement has indeed been proposed for
|
|
Python before (see :pep:`275` and :pep:`3103`). Pattern matching, on the other
|
|
hand, builds a generalized concept of iterable unpacking. Binding values
|
|
extracted from a data structure is at the very core of the concept and hence
|
|
the most common use case. Explicit markers for capture patterns would thus
|
|
betray the objective of the proposed pattern matching syntax and simplify
|
|
a secondary use case at the expense of additional syntactic clutter for
|
|
core cases.
|
|
|
|
It has been proposed that capture patterns are not needed at all,
|
|
since the equivalent effect can be obtained by combining a AS
|
|
pattern with a wildcard pattern (e.g., ``case _ as x`` is equivalent
|
|
to ``case x``). However, this would be unpleasantly verbose,
|
|
especially given that we expect capture patterns to be very common.
|
|
|
|
**Example** using Capture patterns::
|
|
|
|
def average(*args):
|
|
match args:
|
|
case [x, y]: # captures the two elements of a sequence
|
|
return (x + y) / 2
|
|
case [x]: # captures the only element of a sequence
|
|
return x
|
|
case []:
|
|
return 0
|
|
case a: # captures the entire sequence
|
|
return sum(a) / len(a)
|
|
|
|
|
|
.. _wildcard_pattern:
|
|
|
|
Wildcard Pattern
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
The wildcard pattern is a special case of a 'capture' pattern: it accepts
|
|
any value, but does not bind it to a variable. The idea behind this rule
|
|
is to support repeated use of the wildcard in patterns. While ``(x, x)``
|
|
is an error, ``(_, _)`` is legal.
|
|
|
|
Particularly in larger (sequence) patterns, it is important to allow the
|
|
pattern to concentrate on values with actual significance while ignoring
|
|
anything else. Without a wildcard, it would become necessary to 'invent'
|
|
a number of local variables, which would be bound but never used. Even
|
|
when sticking to naming conventions and using e.g. ``_1, _2, _3`` to name
|
|
irrelevant values, say, this still introduces visual clutter and can hurt
|
|
performance (compare the sequence pattern ``(x, y, *z)`` to ``(_, y, *_)``,
|
|
where the ``*z`` forces the interpreter to copy a potentially very long
|
|
sequence, whereas the second version simply compiles to code along the
|
|
lines of ``y = seq[1]``).
|
|
|
|
There has been much discussion about the choice of the underscore as ``_``
|
|
as a wildcard pattern, i.e. making this one name non-binding. However, the
|
|
underscore is already heavily used as an 'ignore value' marker in iterable
|
|
unpacking. Since the wildcard pattern ``_`` never binds, this use of the
|
|
underscore does not interfere with other uses such as inside the REPL or
|
|
the ``gettext`` module.
|
|
|
|
It has been proposed to use ``...`` (i.e., the ellipsis token) or ``*``
|
|
(star) as a wildcard. However, both these look as if an arbitrary number
|
|
of items is omitted::
|
|
|
|
case [a, ..., z]: ...
|
|
case [a, *, z]: ...
|
|
|
|
Either example looks like it would match a sequence of two or more
|
|
items, capturing the first and last values. While that may be the
|
|
ultimate "wildcard", it does not convey the desired semantics.
|
|
|
|
An alternative that does not suggest an arbitrary number of items
|
|
would be ``?``. This is even being proposed independently from
|
|
pattern matching in PEP 640. We feel however that using ``?`` as a
|
|
special "assignment" target is likely more confusing to Python users
|
|
than using ``_``. It violates Python's (admittedly vague) principle
|
|
of using punctuation characters only in ways similar to how they are
|
|
used in common English usage or in high school math, unless the usage
|
|
is *very* well established in other programming languages (like, e.g.,
|
|
using a dot for member access).
|
|
|
|
The question mark fails on both counts: its use in other programming
|
|
languages is a grab-bag of usages only vaguely suggested by the idea
|
|
of a "question". For example, it means "any character" in shell
|
|
globbing, "maybe" in regular expressions, "conditional expression" in
|
|
C and many C-derived languages, "predicate function" in Scheme,
|
|
"modify error handling" in Rust, "optional argument" and "optional
|
|
chaining" in TypeScript (the latter meaning has also been proposed for
|
|
Python by PEP 505). An as yet unnamed PEP proposes it to mark
|
|
optional types, e.g. ``int?``.
|
|
|
|
Another common use of ``?`` in programming systems is "help", for
|
|
example, in IPython and Jupyter Notebooks and many interactive
|
|
command-line utilities.
|
|
|
|
In addition, this would put Python in a rather unique position:
|
|
The underscore is as a wildcard pattern in *every*
|
|
programming language with pattern matching that we could find
|
|
(including *C#*, *Elixir*, *Erlang*, *F#*, *Grace*, *Haskell*,
|
|
*Mathematica*, *OCaml*, *Ruby*, *Rust*, *Scala*, *Swift*, and *Thorn*).
|
|
Keeping in mind that many users of Python also work with other programming
|
|
languages, have prior experience when learning Python, and may move on to
|
|
other languages after having learned Python, we find that such
|
|
well-established standards are important and relevant with respect to
|
|
readability and learnability. In our view, concerns that this wildcard
|
|
means that a regular name received special treatment are not strong
|
|
enough to introduce syntax that would make Python special.
|
|
|
|
*Else blocks.* A case block without a guard whose pattern is a single
|
|
wildcard (i.e., ``case _:``) accepts any subject without binding it to
|
|
a variable or performing any other operation. It is thus semantically
|
|
equivalent to ``else:``, if it were supported. However, adding such
|
|
an else block to the match statement syntax would not remove the need
|
|
for the wildcard pattern in other contexts. Another argument against
|
|
this is that there would be two plausible indentation levels for an
|
|
else block: aligned with ``case`` or aligned with ``match``. The
|
|
authors have found it quite contentious which indentation level to
|
|
prefer.
|
|
|
|
**Example** using the Wildcard pattern::
|
|
|
|
def is_closed(sequence):
|
|
match sequence:
|
|
case [_]: # any sequence with a single element
|
|
return True
|
|
case [start, *_, end]: # a sequence with at least two elements
|
|
return start == end
|
|
case _: # anything
|
|
return False
|
|
|
|
|
|
.. _value_pattern:
|
|
|
|
Value Patterns
|
|
~~~~~~~~~~~~~~
|
|
|
|
It is good programming style to use named constants for parametric values or
|
|
to clarify the meaning of particular values. Clearly, it would be preferable
|
|
to write ``case (HttpStatus.OK, body):`` over
|
|
``case (200, body):``, for example. The main issue that arises here is how to
|
|
distinguish capture patterns (variable bindings) from value patterns. The
|
|
general discussion surrounding this issue has brought forward a plethora of
|
|
options, which we cannot all fully list here.
|
|
|
|
Strictly speaking, value patterns are not really necessary, but
|
|
could be implemented using guards, i.e.
|
|
``case (status, body) if status == HttpStatus.OK:``. Nonetheless, the
|
|
convenience of value patterns is unquestioned and obvious.
|
|
|
|
The observation that constants tend to be written in uppercase letters or
|
|
collected in enumeration-like namespaces suggests possible rules to discern
|
|
constants syntactically. However, the idea of using upper- vs. lowercase as
|
|
a marker has been met with scepticism since there is no similar precedence
|
|
in core Python (although it is common in other languages). We therefore only
|
|
adopted the rule that any dotted name (i.e., attribute access) is to be
|
|
interpreted as a value pattern, for example ``HttpStatus.OK``
|
|
above. This precludes, in particular, local variables and global
|
|
variables defined in the current module from acting as constants.
|
|
|
|
A proposed rule to use a leading dot (e.g.
|
|
``.CONSTANT``) for that purpose was criticised because it was felt that the
|
|
dot would not be a visible-enough marker for that purpose. Partly inspired
|
|
by forms found in other programming languages, a number of different
|
|
markers/sigils were proposed (such as ``^CONSTANT``, ``$CONSTANT``,
|
|
``==CONSTANT``, ``CONSTANT?``, or the word enclosed in backticks), although
|
|
there was no obvious or natural choice. The current proposal therefore
|
|
leaves the discussion and possible introduction of such a 'constant' marker
|
|
for a future PEP.
|
|
|
|
Distinguishing the semantics of names based on whether it is a global
|
|
variable (i.e. the compiler would treat global variables as constants rather
|
|
than capture patterns) leads to various issues. The addition or alteration
|
|
of a global variable in the module could have unintended side effects on
|
|
patterns. Moreover, pattern matching could not be used directly inside a
|
|
module's scope because all variables would be global, making capture
|
|
patterns impossible.
|
|
|
|
**Example** using the Value pattern::
|
|
|
|
def handle_reply(reply):
|
|
match reply:
|
|
case (HttpStatus.OK, MimeType.TEXT, body):
|
|
process_text(body)
|
|
case (HttpStatus.OK, MimeType.APPL_ZIP, body):
|
|
text = deflate(body)
|
|
process_text(text)
|
|
case (HttpStatus.MOVED_PERMANENTLY, new_URI):
|
|
resend_request(new_URI)
|
|
case (HttpStatus.NOT_FOUND):
|
|
raise ResourceNotFound()
|
|
|
|
:pep:`642` proposes using ``?`` as a prefix for *constraint patterns*: arbitrary
|
|
expressions that replace the value and literal patterns defined here. The PEP
|
|
is motivated by a desire to "treat match patterns as a variation on assignment
|
|
targets".
|
|
|
|
We believe that attempting to unify the grammars of assignment targets and
|
|
patterns is attractive, but misguided. Evidence of this is PEP 642's need to
|
|
introduce new syntax (which *neither* assignments nor our proposal requires) as
|
|
a best attempt to unify them. In contrast, consider function parameters and
|
|
iterable unpacking: while they are certainly similar, they each have key
|
|
syntactic incompatibilities that reflect their different purposes.
|
|
|
|
It is also not clear from our research that there is need for value patterns to
|
|
contain arbitrary expressions. Any such dynamic elements can be easily
|
|
expressed using one or more capture patterns together with a guard. Having a
|
|
clear separation between static and dynamic elements of patterns is a benefit,
|
|
not a drawback [1]_.
|
|
|
|
There are deeper issues as well. Even if the grammars for both assignments and
|
|
patterns are made "consistent" with one another, strings, byte-strings,
|
|
mappings, sequences, and iterators will all behave differently in both contexts.
|
|
This leaves us in a worse situation: one where the *grammar* is consistent, but
|
|
the *behavior* differs in meaningful ways.
|
|
|
|
The proposed syntax (prefixing arbitrary expressions with ``?``) is also quite
|
|
ugly and distracting, harming readability. (Our issues with using ``?`` as a
|
|
wildcard symbol here have already been enumerated above.)
|
|
|
|
|
|
Group Patterns
|
|
~~~~~~~~~~~~~~
|
|
|
|
Allowing users to explicitly specify the grouping is particularly helpful
|
|
in case of OR patterns.
|
|
|
|
|
|
.. _sequence_pattern:
|
|
|
|
Sequence Patterns
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
Sequence patterns follow as closely as possible the already established
|
|
syntax and semantics of iterable unpacking. Of course, subpatterns take
|
|
the place of assignment targets (variables, attributes and subscript).
|
|
Moreover, the sequence pattern only matches a carefully selected set of
|
|
possible subjects, whereas iterable unpacking can be applied to any
|
|
iterable.
|
|
|
|
- As in iterable unpacking, we do not distinguish between 'tuple' and
|
|
'list' notation. ``[a, b, c]``, ``(a, b, c)`` and ``a, b, c`` are all
|
|
equivalent. While this means we have a redundant notation and checking
|
|
specifically for lists or tuples requires more effort (e.g.
|
|
``case list([a, b, c])``), we mimic iterable unpacking as much as
|
|
possible.
|
|
|
|
- A starred pattern will capture a sub-sequence of arbitrary length,
|
|
again mirroring iterable unpacking. Only one starred item may be
|
|
present in any sequence pattern. In theory, patterns such as ``(*_, 3, *_)``
|
|
could be understood as expressing any sequence containing the value ``3``.
|
|
In practice, however, this would only work for a very narrow set of use
|
|
cases and lead to inefficient backtracking or even ambiguities otherwise.
|
|
|
|
- The sequence pattern does *not* iterate through an iterable subject. All
|
|
elements are accessed through subscripting and slicing, and the subject must
|
|
be an instance of ``collections.abc.Sequence``. This includes, of course,
|
|
lists and tuples, but excludes e.g. sets and dictionaries. While it would
|
|
include strings and bytes, we make an exception for these (see below).
|
|
|
|
A sequence pattern cannot just iterate through any iterable object. The
|
|
consumption of elements from the iteration would have to be undone if the
|
|
overall pattern fails, which is not feasible.
|
|
|
|
To identify sequences we cannot rely on ``len()`` and subscripting and
|
|
slicing alone, because sequences share these protocols with mappings
|
|
(e.g. `dict`) in this regard. It would be surprising if a sequence
|
|
pattern also matched a dictionaries or other objects implementing
|
|
the mapping protocol (i.e. ``__getitem__``). The interpreter therefore
|
|
performs an instance check to ensure that the subject in question really
|
|
is a sequence (of known type). (As an optimization of the most common
|
|
case, if the subject is exactly a list or a tuple, the instance check
|
|
can be skipped.)
|
|
|
|
String and bytes objects have a dual nature: they are both 'atomic' objects
|
|
in their own right, as well as sequences (with a strongly recursive nature
|
|
in that a string is a sequence of strings). The typical behavior and use
|
|
cases for strings and bytes are different enough from those of tuples and
|
|
lists to warrant a clear distinction. It is in fact often unintuitive and
|
|
unintended that strings pass for sequences, as evidenced by regular questions
|
|
and complaints. Strings and bytes are therefore not matched by a sequence
|
|
pattern, limiting the sequence pattern to a very specific understanding of
|
|
'sequence'. The built-in ``bytearray`` type, being a mutable version of
|
|
``bytes``, also deserves an exception; but we don't intend to
|
|
enumerate all other types that may be used to represent bytes
|
|
(e.g. some, but not all, instances of ``memoryview`` and ``array.array``).
|
|
|
|
|
|
.. _mapping_pattern:
|
|
|
|
Mapping Patterns
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Dictionaries or mappings in general are one of the most important and most
|
|
widely used data structures in Python. In contrast to sequences, mappings
|
|
are built for fast direct access to arbitrary elements identified by a key.
|
|
In most cases an element is retrieved from a dictionary by a known key
|
|
without regard for any ordering or other key-value pairs stored in the same
|
|
dictionary. Particularly common are string keys.
|
|
|
|
The mapping pattern reflects the common usage of dictionary lookup: it allows
|
|
the user to extract some values from a mapping by means of constant/known
|
|
keys and have the values match given subpatterns.
|
|
Extra keys in the subject are ignored even if ``**rest`` is not present.
|
|
This is different from sequence patterns, where extra items will cause a
|
|
match to fail. But mappings are actually different from sequences: they
|
|
have natural structural sub-typing behavior, i.e., passing a dictionary
|
|
with extra keys somewhere will likely just work.
|
|
Should it be
|
|
necessary to impose an upper bound on the mapping and ensure that no
|
|
additional keys are present, then the usual double-star-pattern ``**rest``
|
|
can be used. The special case ``**_`` with a wildcard, however, is not
|
|
supported as it would not have any effect, but might lead to an incorrect
|
|
understanding of the mapping pattern's semantics.
|
|
|
|
To avoid overly expensive matching algorithms, keys must be literals or
|
|
value patterns.
|
|
|
|
There is a subtle reason for using ``get(key, default)`` instead of
|
|
``__getitem__(key)`` followed by a check for ``AttributeError``: if
|
|
the subject happens to be a ``defaultdict``, calling ``__getitem__``
|
|
for a non-existent key would add the key. Using ``get()`` avoids this
|
|
unexpected side effect.
|
|
|
|
**Example** using the Mapping pattern::
|
|
|
|
def change_red_to_blue(json_obj):
|
|
match json_obj:
|
|
case { 'color': ('red' | '#FF0000') }:
|
|
json_obj['color'] = 'blue'
|
|
case { 'children': children }:
|
|
for child in children:
|
|
change_red_to_blue(child)
|
|
|
|
|
|
.. _class_pattern:
|
|
|
|
Class Patterns
|
|
~~~~~~~~~~~~~~
|
|
|
|
Class patterns fulfill two purposes: checking whether a given subject is
|
|
indeed an instance of a specific class, and extracting data from specific
|
|
attributes of the subject. Anecdotal evidence revealed that ``isinstance()``
|
|
is one of the most often used functions in Python in terms of
|
|
static occurrences in programs. Such instance checks typically precede
|
|
a subsequent access to information stored in the object, or a possible
|
|
manipulation thereof. A typical pattern might be along the lines of::
|
|
|
|
def traverse_tree(node):
|
|
if isinstance(node, Node):
|
|
traverse_tree(node.left)
|
|
traverse_tree(node.right)
|
|
elif isinstance(node, Leaf):
|
|
print(node.value)
|
|
|
|
In many cases class patterns occur nested, as in the example
|
|
given in the motivation::
|
|
|
|
if (isinstance(node, BinOp) and node.op == "+"
|
|
and isinstance(node.right, BinOp) and node.right.op == "*"):
|
|
a, b, c = node.left, node.right.left, node.right.right
|
|
# Handle a + b*c
|
|
|
|
The class pattern lets you concisely specify both an instance check
|
|
and relevant attributes (with possible further constraints). It is
|
|
thereby very tempting to write, e.g., ``case Node(left, right):`` in the
|
|
first case above and ``case Leaf(value):`` in the second. While this
|
|
indeed works well for languages with strict algebraic data types, it is
|
|
problematic with the structure of Python objects.
|
|
|
|
When dealing with general Python objects, we face a potentially very large
|
|
number of unordered attributes: an instance of ``Node`` contains a large
|
|
number of attributes (most of which are 'special methods' such as
|
|
``__repr__``). Moreover, the interpreter cannot reliably deduce the
|
|
ordering of attributes. For an object that
|
|
represents a circle, say, there is no inherently obvious ordering of the
|
|
attributes ``x``, ``y`` and ``radius``.
|
|
|
|
We envision two possibilities for dealing with this issue: either explicitly
|
|
name the attributes of interest, or provide an additional mapping that tells
|
|
the interpreter which attributes to extract and in which order. Both
|
|
approaches are supported. Moreover, explicitly naming the attributes of
|
|
interest lets you further specify the required structure of an object; if
|
|
an object lacks an attribute specified by the pattern, the match fails.
|
|
|
|
- Attributes that are explicitly named pick up the syntax of named arguments.
|
|
If an object of class ``Node`` has two attributes ``left`` and ``right``
|
|
as above, the pattern ``Node(left=x, right=y)`` will extract the values of
|
|
both attributes and assign them to ``x`` and ``y``, respectively. The data
|
|
flow from left to right seems unusual, but is in line with mapping patterns
|
|
and has precedents such as assignments via ``as`` in *with*- or
|
|
*import*-statements (and indeed AS patterns).
|
|
|
|
Naming the attributes in question explicitly will be mostly used for more
|
|
complex cases where the positional form (below) is insufficient.
|
|
|
|
- The class field ``__match_args__`` specifies a number of attributes
|
|
together with their ordering, allowing class patterns to rely on positional
|
|
sub-patterns without having to explicitly name the attributes in question.
|
|
This is particularly handy for smaller objects or instances of data classes,
|
|
where the attributes of interest are rather obvious and often have a
|
|
well-defined ordering. In a way, ``__match_args__`` is similar to the
|
|
declaration of formal parameters, which allows calling functions with
|
|
positional arguments rather than naming all the parameters.
|
|
|
|
This is a class attribute, because it needs to be looked up on the class
|
|
named in the class pattern, not on the subject instance.
|
|
|
|
|
|
The syntax of class patterns is based on the idea that de-construction
|
|
mirrors the syntax of construction. This is already the case in virtually
|
|
any Python construct, be assignment targets, function definitions or
|
|
iterable unpacking. In all these cases, we find that the syntax for
|
|
sending and that for receiving 'data' are virtually identical.
|
|
|
|
- Assignment targets such as variables, attributes and subscripts:
|
|
``foo.bar[2] = foo.bar[3]``;
|
|
|
|
- Function definitions: a function defined with ``def foo(x, y, z=6)``
|
|
is called as, e.g., ``foo(123, y=45)``, where the actual arguments
|
|
provided at the call site are matched against the formal parameters
|
|
at the definition site;
|
|
|
|
- Iterable unpacking: ``a, b = b, a`` or ``[a, b] = [b, a]`` or
|
|
``(a, b) = (b, a)``, just to name a few equivalent possibilities.
|
|
|
|
Using the same syntax for reading and writing, l- and r-values, or
|
|
construction and de-construction is widely accepted for its benefits in
|
|
thinking about data, its flow and manipulation. This equally extends to
|
|
the explicit construction of instances, where class patterns ``C(p, q)``
|
|
deliberately mirror the syntax of creating instances.
|
|
|
|
The special case for the built-in classes ``bool``, ``bytearray``
|
|
etc. (where e.g. ``str(x)`` captures the subject value in ``x``) can
|
|
be emulated by a user-defined class as follows::
|
|
|
|
class MyClass:
|
|
__match_args__ = ["__myself__"]
|
|
__myself__ = property(lambda self: self)
|
|
|
|
|
|
**Type annotations for pattern variables.**
|
|
The proposal was to combine patterns with type annotations::
|
|
|
|
match x:
|
|
case [a: int, b: str]: print(f"An int {a} and a string {b}:)
|
|
case [a: int, b: int, c: int]: print("Three ints", a, b, c)
|
|
...
|
|
|
|
This idea has a lot of problems. For one, the colon can only
|
|
be used inside of brackets or parentheses, otherwise the syntax becomes
|
|
ambiguous. And because Python disallows ``isinstance()`` checks
|
|
on generic types, type annotations containing generics will not
|
|
work as expected.
|
|
|
|
|
|
History and Context
|
|
===================
|
|
|
|
Pattern matching emerged in the late 1970s in the form of tuple unpacking
|
|
and as a means to handle recursive data structures such as linked lists or
|
|
trees (object-oriented languages usually use the visitor pattern for handling
|
|
recursive data structures). The early proponents of pattern matching
|
|
organised structured data in 'tagged tuples' rather than ``struct`` as in
|
|
*C* or the objects introduced later. A node in a binary tree would, for
|
|
instance, be a tuple with two elements for the left and right branches,
|
|
respectively, and a ``Node`` tag, written as ``Node(left, right)``. In
|
|
Python we would probably put the tag inside the tuple as
|
|
``('Node', left, right)`` or define a data class `Node` to achieve the
|
|
same effect.
|
|
|
|
Using modern syntax, a depth-first tree traversal would then be written as
|
|
follows::
|
|
|
|
def traverse(node):
|
|
node match:
|
|
case Node(left, right):
|
|
traverse(left)
|
|
traverse(right)
|
|
case Leaf(value):
|
|
handle(value)
|
|
|
|
The notion of handling recursive data structures with pattern matching
|
|
immediately gave rise to the idea of handling more general recursive
|
|
'patterns' (i.e. recursion beyond recursive data structures)
|
|
with pattern matching. Pattern matching would thus also be used to define
|
|
recursive functions such as::
|
|
|
|
def fib(arg):
|
|
match arg:
|
|
case 0:
|
|
return 1
|
|
case 1:
|
|
return 1
|
|
case n:
|
|
return fib(n-1) + fib(n-2)
|
|
|
|
As pattern matching was repeatedly integrated into new and emerging
|
|
programming languages, its syntax slightly evolved and expanded. The two
|
|
first cases in the ``fib`` example above could be written more succinctly
|
|
as ``case 0 | 1:`` with ``|`` denoting alternative patterns. Moreover, the
|
|
underscore ``_`` was widely adopted as a wildcard, a filler where neither
|
|
the structure nor value of parts of a pattern were of substance. Since the
|
|
underscore is already frequently used in equivalent capacity in Python's
|
|
iterable unpacking (e.g., ``_, _, third, _* = something``) we kept these
|
|
universal standards.
|
|
|
|
It is noteworthy that the concept of pattern matching has always been
|
|
closely linked to the concept of functions. The different case clauses
|
|
have always been considered as something like semi-independent functions
|
|
where pattern variables take on the role of parameters. This becomes
|
|
most apparent when pattern matching is written as an overloaded function,
|
|
along the lines of (Standard ML)::
|
|
|
|
fun fib 0 = 1
|
|
| fib 1 = 1
|
|
| fib n = fib (n-1) + fib (n-2)
|
|
|
|
Even though such a strict separation of case clauses into independent
|
|
functions does not apply in Python, we find that patterns share many
|
|
syntactic rules with parameters, such as binding arguments to unqualified
|
|
names only or that variable/parameter names must not be repeated for
|
|
a particular pattern/function.
|
|
|
|
With its emphasis on abstraction and encapsulation, object-oriented
|
|
programming posed a serious challenge to pattern matching. In short: in
|
|
object-oriented programming, we can no longer view objects as tagged tuples.
|
|
The arguments passed into the constructor do not necessarily specify the
|
|
attributes or fields of the objects. Moreover, there is no longer a strict
|
|
ordering of an object's fields and some of the fields might be private and
|
|
thus inaccessible. And on top of this, the given object might actually be
|
|
an instance of a subclass with slightly different structure.
|
|
|
|
To address this challenge, patterns became increasingly independent of the
|
|
original tuple constructors. In a pattern like ``Node(left, right)``,
|
|
``Node`` is no longer a passive tag, but rather a function that can actively
|
|
check for any given object whether it has the right structure and extract a
|
|
``left`` and ``right`` field. In other words: the ``Node``-tag becomes a
|
|
function that transforms an object into a tuple or returns some failure
|
|
indicator if it is not possible.
|
|
|
|
In Python, we simply use ``isinstance()`` together with the ``__match_args__``
|
|
field of a class to check whether an object has the correct structure and
|
|
then transform some of its attributes into a tuple. For the `Node` example
|
|
above, for instance, we would have ``__match_args__ = ('left', 'right')`` to
|
|
indicate that these two attributes should be extracted to form the tuple.
|
|
That is, ``case Node(x, y)`` would first check whether a given object is an
|
|
instance of ``Node`` and then assign ``left`` to ``x`` and ``right`` to ``y``,
|
|
respectively.
|
|
|
|
Paying tribute to Python's dynamic nature with 'duck typing', however, we
|
|
also added a more direct way to specify the presence of, or constraints on
|
|
specific attributes. Instead of ``Node(x, y)`` you could also write
|
|
``object(left=x, right=y)``, effectively eliminating the ``isinstance()``
|
|
check and thus supporting any object with ``left`` and ``right`` attributes.
|
|
Or you would combine these ideas to write ``Node(right=y)`` so as to require
|
|
an instance of ``Node`` but only extract the value of the `right` attribute.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
Through its use of "soft keywords" and the new PEG parser (PEP 617),
|
|
the proposal remains fully backwards compatible. However, 3rd party
|
|
tooling that uses a LL(1) parser to parse Python source code may be
|
|
forced to switch parser technology to be able to support those same
|
|
features.
|
|
|
|
|
|
Security Implications
|
|
=====================
|
|
|
|
We do not expect any security implications from this language feature.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
A `feature-complete CPython implementation
|
|
<https://github.com/brandtbucher/cpython/tree/patma>`_ is available on
|
|
GitHub.
|
|
|
|
An `interactive playground
|
|
<https://mybinder.org/v2/gh/gvanrossum/patma/master?urlpath=lab/tree/playground-622.ipynb>`_
|
|
based on the above implementation was created using Binder [2]_ and Jupyter [3]_.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] Kohn et al., Dynamic Pattern Matching with Python
|
|
https://gvanrossum.github.io/docs/PyPatternMatching.pdf
|
|
|
|
.. [2] Binder
|
|
https://mybinder.org
|
|
|
|
.. [3] Jupyter
|
|
https://jupyter.org
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|