561 lines
21 KiB
ReStructuredText
561 lines
21 KiB
ReStructuredText
PEP: 636
|
|
Title: Structural Pattern Matching: Tutorial
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Daniel F Moisset <dfmoisset@gmail.com>,
|
|
Tobias Kohn <kohnt@tobiaskohn.ch>
|
|
Sponsor: Guido van Rossum <guido@python.org>
|
|
BDFL-Delegate:
|
|
Discussions-To: Python-Dev <python-dev@python.org>
|
|
Status: Draft
|
|
Type: Informational
|
|
Content-Type: text/x-rst
|
|
Created: 12-Sep-2020
|
|
Python-Version: 3.10
|
|
Post-History:
|
|
Resolution:
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
**NOTE:** This draft is incomplete and not intended for review yet.
|
|
We're checking it into the peps repo for the convenience of the authors.
|
|
|
|
This PEP is a tutorial for the pattern matching introduced by PEP 634.
|
|
|
|
PEP 622 proposed syntax for pattern matching, which received detailed discussion
|
|
both from the community and the Steering Council. A frequent concern was
|
|
about how easy it would be to explain (and learn) this feature. This PEP
|
|
addresses that concern providing the kind of document which developers could use
|
|
to learn about pattern matching in Python.
|
|
|
|
This is considered supporting material for PEP 634 (the technical specification
|
|
for pattern matching) and PEP 635 (the motivation and rationale for having pattern
|
|
matching and design considerations).
|
|
|
|
For readers who are looking more for a quick review than for a tutorial,
|
|
see `Appendix A`_.
|
|
|
|
Meta
|
|
====
|
|
|
|
This section is intended to get in sync about style and language with
|
|
co-authors. It should be removed from the released PEP
|
|
|
|
The following are design decisions I made while writing this:
|
|
|
|
1. Who is the target audience?
|
|
I'm considering "People with general Python experience" (i.e. who shouldn't be surprised
|
|
at anything in the Python tutorial), but not necessarily involved with the
|
|
design/development or Python. I'm assuming someone who hasn't been exposed to pattern
|
|
matching in other languages.
|
|
|
|
2. How detailed should this document be?
|
|
I considered a range from "very superficial" (like the detail level you might find about
|
|
statements in the Python tutorial) to "terse but complete" like
|
|
https://github.com/gvanrossum/patma/#tutorial
|
|
to "long and detailed". I chose the later, we can always trim down from that.
|
|
|
|
3. What kind of examples to use?
|
|
I tried to write examples that are could that I might write using pattern matching. I
|
|
avoided going
|
|
for a full application (because the examples I have in mind are too large for a PEP) but
|
|
I tried to follow ideas related to a single project to thread the story-telling more
|
|
easily. This is probably the most controversial thing here, and if the rest of
|
|
the authors dislike it, we can change to a more formal explanatory style.
|
|
|
|
Other rules I'm following (let me know if I forgot to):
|
|
|
|
* I'm not going to reference/compare with other languages
|
|
* I'm not trying to convince the reader that this is a good idea (that's the job of
|
|
PEP 635) just explain how to use it
|
|
* I'm not trying to cover every corner case (that's the job of PEP 634), just cover
|
|
how to use the full functionality in the "normal" cases.
|
|
* I talk to the learner in second person
|
|
|
|
Tutorial
|
|
========
|
|
|
|
As an example to motivate this tutorial, you will be writing a text-adventure. That is
|
|
a form of interactive fiction where the user enters text commands to interact with a
|
|
fictional world and receives text descriptions of what happens. Commands will be
|
|
simplified forms of natural language like ``get sword``, ``attack dragon``, ``go north``,
|
|
``enter shop`` or ``buy cheese``.
|
|
|
|
Matching sequences
|
|
------------------
|
|
|
|
Your main loop will need to get input from the user and split it into words, let's say
|
|
a list of strings like this::
|
|
|
|
command = input("What are you doing next? ")
|
|
# analyze the result of command.split()
|
|
|
|
The next step is to interpret the words. Most of our commands will have two words: an
|
|
action and an object. So you may be tempted to do the following::
|
|
|
|
[action, obj] = command.split()
|
|
... # interpret action, obj
|
|
|
|
The problem with that line of code is that it's missing something: what if the user
|
|
types more or fewer than 2 words? To prevent this problem you can either check the length
|
|
of the list of words, or capture the ``ValueError`` that the statement above would raise.
|
|
|
|
You can use a matching statement instead::
|
|
|
|
match command.split():
|
|
case [action, obj]:
|
|
... # interpret action, obj
|
|
|
|
The ``match`` statement evaluates the **subject** after the ``match`` keyword, and checks
|
|
it against the **pattern** next to ``case``. A pattern is able to do two different
|
|
things:
|
|
|
|
* Verify that the subject has certain structure. In your case, the ``[action, obj]``
|
|
pattern matches any sequence of exactly two elements. This is called **matching**
|
|
* It will bind some names in the pattern to component elements of your subject. In
|
|
this case, if the list has two elements, it will bind ``action = subject[0]`` and
|
|
``obj = subject[1]``. This is called **destructuring**
|
|
|
|
If there's a match, the statements inside the ``case`` clause will be executed with the
|
|
bound variables. If there's no match, nothing happens and the next statement after
|
|
``match`` keeps running.
|
|
|
|
TODO: discuss other sequences, tuples. Discuss syntax with parenthesis. discuss
|
|
iterators? discuss [x, x] possibly later on?
|
|
|
|
Matching multiple patterns
|
|
--------------------------
|
|
|
|
Even if most commands have the action/object form, you might want to have user commands
|
|
of different lengths. For example you might want to add single verbs with no object like
|
|
``look`` or ``quit``. A match statement can (and is likely to) have more than one
|
|
``case``::
|
|
|
|
match command.split():
|
|
case [action]:
|
|
... # interpret single-verb action
|
|
case [action, obj]:
|
|
... # interpret action, obj
|
|
|
|
The ``match`` statement will check patterns from top to bottom. If the pattern doesn't
|
|
match the subject, the next pattern will be tried. However, once the *first*
|
|
matching ``case`` clause is found, the body of that clause is executed, and all further
|
|
``case`` clauses are ignored. This is similar to the way that an ``if/elif/elif/...``
|
|
statement works.
|
|
|
|
Matching specific values
|
|
------------------------
|
|
|
|
Your code still needs to look at the specific actions and conditionally run
|
|
different logic depending on the specific action (e.g., ``quit``, ``attack``, or ``buy``).
|
|
You could do that using a chain of ``if/elif/elif/...``, or using a dictionary of
|
|
functions, but here we'll leverage pattern matching to solve that task. Instead of a
|
|
variable, you can use literal values in patterns (like ``"quit"``, ``42``, or ``None``).
|
|
This allows you to write::
|
|
|
|
match command.split():
|
|
case ["quit"]:
|
|
print("Goodbye!")
|
|
quit_game()
|
|
case ["look"]:
|
|
current_room.describe()
|
|
case ["get", obj]:
|
|
character.get(obj, current_room)
|
|
case ["go", direction]:
|
|
current_room = current_room.neighbor(direction)
|
|
# The rest of your commands go here
|
|
|
|
A pattern like ``["get", obj]`` will match only 2-element sequences that have a first
|
|
element equal to ``"get"``. When destructuring, it will bind ``obj = subject[1]``.
|
|
|
|
As you can see in the ``go`` case, we also can use different variable names in
|
|
different patterns.
|
|
|
|
FIXME: This *might* be the place to explain a bit that when I say "literal" I mean it
|
|
literally, and a "soft constant" will not work :)
|
|
|
|
Matching slices
|
|
---------------
|
|
|
|
A player may be able to drop multiple objects by using a series of commands
|
|
``drop key``, ``drop sword``, ``drop cheese``. This interface might be cumbersome, and
|
|
you might like to allow dropping multiple items in a single command, like
|
|
``drop key sword cheese``. In this case you don't know beforehand how many words will
|
|
be in the command, but you can use extended unpacking in patterns in the same way that
|
|
they are allowed in assignments::
|
|
|
|
match command.split():
|
|
case ["drop", *objects]:
|
|
for obj in objects:
|
|
character.drop(obj, current_room)
|
|
# The rest of your commands go here
|
|
|
|
This will match any sequences having "drop" as its first elements. All remaining
|
|
elements will be captured in a ``list`` object which will be bound to the ``objects``
|
|
variable.
|
|
|
|
This syntax has similar restrictions as sequence unpacking: you can not have more than one
|
|
starred name in a pattern.
|
|
|
|
Adding a catch-all
|
|
------------------
|
|
|
|
You may want to print an error message saying that the command wasn't recognized when
|
|
all the patterns fail. You could use the feature we just learned and write the
|
|
following::
|
|
|
|
match command.split():
|
|
case ["quit"]: ... # Code omitted for brevity
|
|
case ["go", direction]: ...
|
|
case ["drop", *objects]: ...
|
|
... # Other case clauses
|
|
case [*ignored_words]:
|
|
print(f"Sorry, I couldn't understand {command!r}")
|
|
|
|
Note that you must add this last pattern at the end, otherwise it will match before other
|
|
possible patterns that could be considered. This works but it's a bit verbose and
|
|
somewhat wasteful: this will make a full copy of the word list, which will be bound to
|
|
``ignored_words`` even if it's never used.
|
|
|
|
You can use an special pattern which is written ``_``, which always matches but it
|
|
doesn't bind anything. which would allow you to rewrite::
|
|
|
|
match command.split():
|
|
... # Other case clauses
|
|
case [*_]:
|
|
print(f"Sorry, I couldn't understand {command!r}")
|
|
|
|
This pattern will match for any sequence. In this case we can simplify even more and
|
|
match any object::
|
|
|
|
match command.split():
|
|
... # Other case clauses
|
|
case _:
|
|
print(f"Sorry, I couldn't understand {command!r}")
|
|
|
|
TODO: Explain about syntaxerror when having an irrefutable pattern above others?
|
|
|
|
How patterns are composed
|
|
-------------------------
|
|
|
|
This is a good moment to step back from the examples and understand how the patterns
|
|
that you have been using are built. Patterns can be nested within each other, and we
|
|
have being doing that implicitly in the examples above.
|
|
|
|
There are some "simple" patterns ("simple" here meaning that they do not contain other
|
|
patterns) that we've seen:
|
|
|
|
* **Literal patterns** (string literals, number literals, ``True``, ``False``, and
|
|
``None``)
|
|
* The **wildcard pattern** ``_``
|
|
* **Capture patterns** (stand-alone names like ``direction``, ``action``, ``objects``). We
|
|
never discussed these separately, but used them as part of other patterns. Note that
|
|
a capture pattern by itself will always match, and usually makes sense only
|
|
as a catch-all at the end of your ``match`` if you desire to bind the name to the
|
|
subject.
|
|
|
|
Until now, the only non-simple pattern we have experimented with is the sequence pattern.
|
|
Each element in a sequence pattern can in fact be
|
|
any other pattern. This means that you could write a pattern like
|
|
``["first", (left, right), *rest]``. This will match subjects which are a sequence of at
|
|
least two elements, where the first one is equal to ``"first"`` and the second one is
|
|
in turn a sequence of two elements. It will also bind ``left=subject[1][0]``,
|
|
``right=subject[1][1]``, and ``rest = subject[2:]``
|
|
|
|
Alternate patterns
|
|
------------------
|
|
|
|
Going back to the adventure game example, you may find that you'd like to have several
|
|
patterns resulting in the same outcome. For example, you might want the commands
|
|
``north`` and ``go north`` be equivalent. You may also desire to have aliases for
|
|
``get X``, ``pick up X`` and ``pick X up`` for any X.
|
|
|
|
The ``|`` symbol in patterns combines them as alternatives. You could for example write::
|
|
|
|
match command.split():
|
|
... # Other case clauses
|
|
case ["north"] | ["go", "north"]:
|
|
current_room = current_room.neighbor("north")
|
|
case ["get", obj] | ["pick", "up", obj] | ["pick", obj, "up"]:
|
|
... # Code for picking up the given object
|
|
|
|
This is called an **or pattern** and will produce the expected result. Patterns are
|
|
attempted from left to right; this may be relevant to know what is bound if more than
|
|
one alternative matches. An important restriction when writing or patterns is that all
|
|
alternatives should bind the same variables. So a pattern ``[1, x] | [2, y]`` is not
|
|
allowed because it would make unclear which variable would be bound after a successful
|
|
match. ``[1, x] | [2, x]`` is perfectly fine and will always bind ``x`` if successful.
|
|
|
|
|
|
Capturing matched sub-patterns
|
|
------------------------------
|
|
|
|
The first version of our "go" command was written with a ``["go", direction]`` pattern.
|
|
The change we did in our last version using the pattern ``["north"] | ["go", "north"]``
|
|
has some benefits but also some drawbacks in comparison: the latest version allows the
|
|
alias, but also has the direction hardcoded, which will force us to actually have
|
|
separate patterns for north/south/east/west. This leads to some code duplication, but at
|
|
the same time we get better input validation, and we will not be getting into that
|
|
branch if the command entered by the user is ``"go figure!"`` instead of an direction.
|
|
|
|
We could try to get the best of both worlds doing the following (I'll omit the aliased
|
|
version without "go" for brevity)::
|
|
|
|
match command.split():
|
|
case ["go", ("north" | "south" | "east" | "west")]:
|
|
current_room = current_room.neighbor(...)
|
|
# how do I know which direction to go?
|
|
|
|
This code is a single branch, and it verifies that the word after "go" is really a
|
|
direction. But the code moving the player around needs to know which one was chosen and
|
|
has no way to do so. What we need is a pattern that behaves like the or pattern but at
|
|
the same time does a capture. We can do so with a **walrus pattern**::
|
|
|
|
match command.split():
|
|
case ["go", direction := ("north" | "south" | "east" | "west")]:
|
|
current_room = current_room.neighbor(direction)
|
|
|
|
The walrus pattern (named like that because the ``:=`` operator looks like a sideways
|
|
walrus) matches whatever pattern is on its right hand side, but also binds the value to
|
|
a name.
|
|
|
|
Adding conditions to patterns
|
|
-----------------------------
|
|
|
|
The patterns we have explored above can do some powerful data filtering, but sometimes
|
|
you may wish for the full power of a boolean expression. Let's say that you would actually
|
|
like to allow a "go" command only in a restricted set of directions based on the possible
|
|
exits from the current_room. We can achieve that by adding a **guard** to our
|
|
case-clause. Guards consist of the ``if`` keyword followed by any expression::
|
|
|
|
match command.split():
|
|
case ["go", direction] if direction in current_room.exits:
|
|
current_room = current_room.neighbor(direction)
|
|
case ["go", _]:
|
|
print("Sorry, you can't go that way")
|
|
|
|
The guard is not part of the pattern, it's part of the case clause. It's only checked if
|
|
the pattern matches, and after all the pattern variables have been bound (that's why the
|
|
condition can use the ``direction`` variable in the example above). If the pattern
|
|
matches and the condition is truthy, the body of the case clause runs normally. If the
|
|
pattern matches but the condition is falsy, the match statement proceeds to check the
|
|
next ``case`` clause as if the pattern hadn't matched (with the possible side-effect of
|
|
having already bound some variables).
|
|
|
|
The sequence of these steps must be considered carefully when combining or-patterns and
|
|
guards. If you have ``case [x, 100] | [0, x] if x > 10`` and your subject is
|
|
``[0, 100]``, the clause will be skipped. This happens because:
|
|
|
|
* The or-pattern finds the first alternative that matches the subject, which happens to
|
|
be ``[x, 100]``
|
|
* ``x`` is bound to 0
|
|
* The condition x > 10 is checked. Given that it's false, the whole case clause is
|
|
skipped. The ``[0, x]`` pattern is never attempted.
|
|
|
|
Going to the cloud: Mappings
|
|
----------------------------
|
|
|
|
TODO: Give the motivating example of network requests, describe JSON based "protocol"
|
|
|
|
TODO: partial matches, double stars
|
|
|
|
Matching objects
|
|
----------------
|
|
|
|
UI events motivations. describe events in dataclasses. inspiration for event objects
|
|
can be taken from https://www.pygame.org/docs/ref/event.html
|
|
|
|
example of getting constants from module (like key names for keyboard events)
|
|
|
|
customizing match_args?
|
|
|
|
|
|
.. _Appendix A:
|
|
|
|
Appendix A -- Quick Intro
|
|
=========================
|
|
|
|
A ``match`` statement takes an expression and compares it to successive
|
|
patterns given as one or more ``case`` blocks. This is superficially
|
|
similar to a ``switch`` statement in C, Java or JavaScript (and many
|
|
other languages), but much more powerful.
|
|
|
|
The simplest form compares a subject value against one or more literals::
|
|
|
|
def http_error(status):
|
|
match status:
|
|
case 400:
|
|
return "Bad request"
|
|
case 401:
|
|
return "Unauthorized"
|
|
case 403:
|
|
return "Forbidden"
|
|
case 404:
|
|
return "Not found"
|
|
case 418:
|
|
return "I'm a teapot"
|
|
case _:
|
|
return "Something's wrong with the Internet"
|
|
|
|
Note the last block: the "variable name" ``_`` acts as a *wildcard* and
|
|
never fails to match.
|
|
|
|
You can combine several literals in a single pattern using ``|`` ("or")::
|
|
|
|
case 401 | 403 | 404:
|
|
return "Not allowed"
|
|
|
|
Patterns can look like unpacking assignments, and can be used to bind
|
|
variables::
|
|
|
|
# The subject is an (x, y) tuple
|
|
match point:
|
|
case (0, 0):
|
|
print("Origin")
|
|
case (0, y):
|
|
print(f"Y={y}")
|
|
case (x, 0):
|
|
print(f"X={x}")
|
|
case (x, y):
|
|
print(f"X={x}, Y={y}")
|
|
case _:
|
|
raise ValueError("Not a point")
|
|
|
|
Study that one carefully! The first pattern has two literals, and can
|
|
be thought of as an extension of the literal pattern shown above. But
|
|
the next two patterns combine a literal and a variable, and the
|
|
variable *captures* a value from the subject (``point``). The fourth
|
|
pattern captures two values, which makes it conceptually similar to
|
|
the unpacking assignment ``(x, y) = point``.
|
|
|
|
If you are using classes to structure your data (e.g. data classes)
|
|
you can use the class name followed by an argument list resembling a
|
|
constructor, but with the ability to capture variables::
|
|
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass
|
|
class Point:
|
|
x: int
|
|
y: int
|
|
|
|
def whereis(point):
|
|
match point:
|
|
case Point(0, 0):
|
|
print("Origin")
|
|
case Point(0, y):
|
|
print(f"Y={y}")
|
|
case Point(x, 0):
|
|
print(f"X={x}")
|
|
case Point():
|
|
print("Somewhere else")
|
|
case _:
|
|
print("Not a point")
|
|
|
|
We can use keyword parameters too. The following patterns are all
|
|
equivalent (and all bind the ``y`` attribute to the ``var`` variable)::
|
|
|
|
Point(1, var)
|
|
Point(1, y=var)
|
|
Point(x=1, y=var)
|
|
Point(y=var, x=1)
|
|
|
|
Patterns can be arbitrarily nested. For example, if we have a short
|
|
list of points, we could match it like this::
|
|
|
|
match points:
|
|
case []:
|
|
print("No points")
|
|
case [Point(0, 0)]:
|
|
print("The origin")
|
|
case [Point(x, y)]:
|
|
print(f"Single point {x}, {y}")
|
|
case [Point(0, y1), Point(0, y2)]:
|
|
print(f"Two on the Y axis at {y1}, {y2}")
|
|
case _:
|
|
print("Something else")
|
|
|
|
We can add an ``if`` clause to a pattern, known as a "guard". If the
|
|
guard is false, ``match`` goes on to try the next ``case`` block. Note
|
|
that value capture happens before the guard is evaluated::
|
|
|
|
match point:
|
|
case Point(x, y) if x == y:
|
|
print(f"Y=X at {x}")
|
|
case Point(x, y):
|
|
print(f"Not on the diagonal")
|
|
|
|
Several other key features:
|
|
|
|
- Like unpacking assignments, tuple and list patterns have exactly the
|
|
same meaning and actually match arbitrary sequences. An important
|
|
exception is that they don't match iterators or strings.
|
|
(Technically, the subject must be an instance of
|
|
``collections.abc.Sequence``.)
|
|
|
|
- Sequence patterns support wildcards: ``[x, y, *rest]`` and ``(x, y,
|
|
*rest)`` work similar to wildcards in unpacking assignments. The
|
|
name after ``*`` may also be ``_``, so ``(x, y, *_)`` matches a sequence
|
|
of at least two items without binding the remaining items.
|
|
|
|
- Mapping patterns: ``{"bandwidth": b, "latency": l}`` captures the
|
|
``"bandwidth"`` and ``"latency"`` values from a dict. Unlike sequence
|
|
patterns, extra keys are ignored. A wildcard ``**rest`` is also
|
|
supported. (But ``**_`` would be redundant, so it not allowed.)
|
|
|
|
- Subpatterns may be captured using the walrus (``:=``) operator::
|
|
|
|
case (Point(x1, y1), p2 := Point(x2, y2)): ...
|
|
|
|
- Patterns may use named constants. These must be dotted names
|
|
to prevent them from being interpreted as capture variable::
|
|
|
|
from enum import Enum
|
|
class Color(Enum):
|
|
RED = 0
|
|
GREEN = 1
|
|
BLUE = 2
|
|
|
|
match color:
|
|
case Color.RED:
|
|
print("I see red!")
|
|
case Color.GREEN:
|
|
print("Grass is green")
|
|
case Color.BLUE:
|
|
print("I'm feeling the blues :(")
|
|
|
|
- The literals ``None``, ``False`` and ``True`` are treated specially:
|
|
comparisons to the subject are done using ``is``. This::
|
|
|
|
match b:
|
|
case True:
|
|
print("Yes!")
|
|
|
|
is exactly equivalent to this::
|
|
|
|
if b is True:
|
|
print("Yes!")
|
|
|
|
- Classes may override the mapping from positional arguments to
|
|
attributes by setting a class variable ``__match_args__``.
|
|
Read about it in PEP 634.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|
|
End:
|