PEP: 636 Title: Structural Pattern Matching: Tutorial Version: $Revision$ Last-Modified: $Date$ Author: Daniel F Moisset , Tobias Kohn Sponsor: Guido van Rossum BDFL-Delegate: Discussions-To: Python-Dev Status: Draft Type: Informational Content-Type: text/x-rst Created: 12-Sep-2020 Python-Version: 3.10 Post-History: Resolution: Abstract ======== **NOTE:** This draft is incomplete and not intended for review yet. We're checking it into the peps repo for the convenience of the authors. This PEP is a tutorial for the pattern matching introduced by PEP 634. PEP 622 proposed syntax for pattern matching, which received detailed discussion both from the community and the Steering Council. A frequent concern was about how easy it would be to explain (and learn) this feature. This PEP addresses that concern providing the kind of document which developers could use to learn about pattern matching in Python. This is considered supporting material for PEP 634 (the technical specification for pattern matching) and PEP 635 (the motivation and rationale for having pattern matching and design considerations). For readers who are looking more for a quick review than for a tutorial, see `Appendix A`_. Meta ==== This section is intended to get in sync about style and language with co-authors. It should be removed from the released PEP The following are design decisions I made while writing this: 1. Who is the target audience? I'm considering "People with general Python experience" (i.e. who shouldn't be surprised at anything in the Python tutorial), but not necessarily involved with the design/development or Python. I'm assuming someone who hasn't been exposed to pattern matching in other languages. 2. How detailed should this document be? I considered a range from "very superficial" (like the detail level you might find about statements in the Python tutorial) to "terse but complete" like https://github.com/gvanrossum/patma/#tutorial to "long and detailed". I chose the later, we can always trim down from that. 3. What kind of examples to use? I tried to write examples that are could that I might write using pattern matching. I avoided going for a full application (because the examples I have in mind are too large for a PEP) but I tried to follow ideas related to a single project to thread the story-telling more easily. This is probably the most controversial thing here, and if the rest of the authors dislike it, we can change to a more formal explanatory style. Other rules I'm following (let me know if I forgot to): * I'm not going to reference/compare with other languages * I'm not trying to convince the reader that this is a good idea (that's the job of PEP 635) just explain how to use it * I'm not trying to cover every corner case (that's the job of PEP 634), just cover how to use the full functionality in the "normal" cases. * I talk to the learner in second person Tutorial ======== As an example to motivate this tutorial, you will be writing a text-adventure. That is a form of interactive fiction where the user enters text commands to interact with a fictional world and receives text descriptions of what happens. Commands will be simplified forms of natural language like ``get sword``, ``attack dragon``, ``go north``, ``enter shop`` or ``buy cheese``. Matching sequences ------------------ Your main loop will need to get input from the user and split it into words, let's say a list of strings like this:: command = input("What are you doing next? ") # analyze the result of command.split() The next step is to interpret the words. Most of our commands will have two words: an action and an object. So you may be tempted to do the following:: [action, obj] = command.split() ... # interpret action, obj The problem with that line of code is that it's missing something: what if the user types more or fewer than 2 words? To prevent this problem you can either check the length of the list of words, or capture the ``ValueError`` that the statement above would raise. You can use a matching statement instead:: match command.split(): case [action, obj]: ... # interpret action, obj The ``match`` statement evaluates the **subject** after the ``match`` keyword, and checks it against the **pattern** next to ``case``. A pattern is able to do two different things: * Verify that the subject has certain structure. In your case, the ``[action, obj]`` pattern matches any sequence of exactly two elements. This is called **matching** * It will bind some names in the pattern to component elements of your subject. In this case, if the list has two elements, it will bind ``action = subject[0]`` and ``obj = subject[1]``. This is called **destructuring** If there's a match, the statements inside the ``case`` clause will be executed with the bound variables. If there's no match, nothing happens and the next statement after ``match`` keeps running. TODO: discuss other sequences, tuples. Discuss syntax with parenthesis. discuss iterators? discuss [x, x] possibly later on? Matching multiple patterns -------------------------- Even if most commands have the action/object form, you might want to have user commands of different lengths. For example you might want to add single verbs with no object like ``look`` or ``quit``. A match statement can (and is likely to) have more than one ``case``:: match command.split(): case [action]: ... # interpret single-verb action case [action, obj]: ... # interpret action, obj The ``match`` statement will check patterns from top to bottom. If the pattern doesn't match the subject, the next pattern will be tried. However, once the *first* matching ``case`` clause is found, the body of that clause is executed, and all further ``case`` clauses are ignored. This is similar to the way that an ``if/elif/elif/...`` statement works. Matching specific values ------------------------ Your code still needs to look at the specific actions and conditionally run different logic depending on the specific action (e.g., ``quit``, ``attack``, or ``buy``). You could do that using a chain of ``if/elif/elif/...``, or using a dictionary of functions, but here we'll leverage pattern matching to solve that task. Instead of a variable, you can use literal values in patterns (like ``"quit"``, ``42``, or ``None``). This allows you to write:: match command.split(): case ["quit"]: print("Goodbye!") quit_game() case ["look"]: current_room.describe() case ["get", obj]: character.get(obj, current_room) case ["go", direction]: current_room = current_room.neighbor(direction) # The rest of your commands go here A pattern like ``["get", obj]`` will match only 2-element sequences that have a first element equal to ``"get"``. When destructuring, it will bind ``obj = subject[1]``. As you can see in the ``go`` case, we also can use different variable names in different patterns. FIXME: This *might* be the place to explain a bit that when I say "literal" I mean it literally, and a "soft constant" will not work :) Matching slices --------------- A player may be able to drop multiple objects by using a series of commands ``drop key``, ``drop sword``, ``drop cheese``. This interface might be cumbersome, and you might like to allow dropping multiple items in a single command, like ``drop key sword cheese``. In this case you don't know beforehand how many words will be in the command, but you can use extended unpacking in patterns in the same way that they are allowed in assignments:: match command.split(): case ["drop", *objects]: for obj in objects: character.drop(obj, current_room) # The rest of your commands go here This will match any sequences having "drop" as its first elements. All remaining elements will be captured in a ``list`` object which will be bound to the ``objects`` variable. This syntax has similar restrictions as sequence unpacking: you can not have more than one starred name in a pattern. Adding a catch-all ------------------ You may want to print an error message saying that the command wasn't recognized when all the patterns fail. You could use the feature we just learned and write the following:: match command.split(): case ["quit"]: ... # Code omitted for brevity case ["go", direction]: ... case ["drop", *objects]: ... ... # Other case clauses case [*ignored_words]: print(f"Sorry, I couldn't understand {command!r}") Note that you must add this last pattern at the end, otherwise it will match before other possible patterns that could be considered. This works but it's a bit verbose and somewhat wasteful: this will make a full copy of the word list, which will be bound to ``ignored_words`` even if it's never used. You can use an special pattern which is written ``_``, which always matches but it doesn't bind anything. which would allow you to rewrite:: match command.split(): ... # Other case clauses case [*_]: print(f"Sorry, I couldn't understand {command!r}") This pattern will match for any sequence. In this case we can simplify even more and match any object:: match command.split(): ... # Other case clauses case _: print(f"Sorry, I couldn't understand {command!r}") TODO: Explain about syntaxerror when having an irrefutable pattern above others? How patterns are composed ------------------------- This is a good moment to step back from the examples and understand how the patterns that you have been using are built. Patterns can be nested within each other, and we have being doing that implicitly in the examples above. There are some "simple" patterns ("simple" here meaning that they do not contain other patterns) that we've seen: * **Literal patterns** (string literals, number literals, ``True``, ``False``, and ``None``) * The **wildcard pattern** ``_`` * **Capture patterns** (stand-alone names like ``direction``, ``action``, ``objects``). We never discussed these separately, but used them as part of other patterns. Note that a capture pattern by itself will always match, and usually makes sense only as a catch-all at the end of your ``match`` if you desire to bind the name to the subject. Until now, the only non-simple pattern we have experimented with is the sequence pattern. Each element in a sequence pattern can in fact be any other pattern. This means that you could write a pattern like ``["first", (left, right), *rest]``. This will match subjects which are a sequence of at least two elements, where the first one is equal to ``"first"`` and the second one is in turn a sequence of two elements. It will also bind ``left=subject[1][0]``, ``right=subject[1][1]``, and ``rest = subject[2:]`` Alternate patterns ------------------ Going back to the adventure game example, you may find that you'd like to have several patterns resulting in the same outcome. For example, you might want the commands ``north`` and ``go north`` be equivalent. You may also desire to have aliases for ``get X``, ``pick up X`` and ``pick X up`` for any X. The ``|`` symbol in patterns combines them as alternatives. You could for example write:: match command.split(): ... # Other case clauses case ["north"] | ["go", "north"]: current_room = current_room.neighbor("north") case ["get", obj] | ["pick", "up", obj] | ["pick", obj, "up"]: ... # Code for picking up the given object This is called an **or pattern** and will produce the expected result. Patterns are attempted from left to right; this may be relevant to know what is bound if more than one alternative matches. An important restriction when writing or patterns is that all alternatives should bind the same variables. So a pattern ``[1, x] | [2, y]`` is not allowed because it would make unclear which variable would be bound after a successful match. ``[1, x] | [2, x]`` is perfectly fine and will always bind ``x`` if successful. Capturing matched sub-patterns ------------------------------ The first version of our "go" command was written with a ``["go", direction]`` pattern. The change we did in our last version using the pattern ``["north"] | ["go", "north"]`` has some benefits but also some drawbacks in comparison: the latest version allows the alias, but also has the direction hardcoded, which will force us to actually have separate patterns for north/south/east/west. This leads to some code duplication, but at the same time we get better input validation, and we will not be getting into that branch if the command entered by the user is ``"go figure!"`` instead of an direction. We could try to get the best of both worlds doing the following (I'll omit the aliased version without "go" for brevity):: match command.split(): case ["go", ("north" | "south" | "east" | "west")]: current_room = current_room.neighbor(...) # how do I know which direction to go? This code is a single branch, and it verifies that the word after "go" is really a direction. But the code moving the player around needs to know which one was chosen and has no way to do so. What we need is a pattern that behaves like the or pattern but at the same time does a capture. We can do so with a **walrus pattern**:: match command.split(): case ["go", direction := ("north" | "south" | "east" | "west")]: current_room = current_room.neighbor(direction) The walrus pattern (named like that because the ``:=`` operator looks like a sideways walrus) matches whatever pattern is on its right hand side, but also binds the value to a name. Adding conditions to patterns ----------------------------- The patterns we have explored above can do some powerful data filtering, but sometimes you may wish for the full power of a boolean expression. Let's say that you would actually like to allow a "go" command only in a restricted set of directions based on the possible exits from the current_room. We can achieve that by adding a **guard** to our case-clause. Guards consist of the ``if`` keyword followed by any expression:: match command.split(): case ["go", direction] if direction in current_room.exits: current_room = current_room.neighbor(direction) case ["go", _]: print("Sorry, you can't go that way") The guard is not part of the pattern, it's part of the case clause. It's only checked if the pattern matches, and after all the pattern variables have been bound (that's why the condition can use the ``direction`` variable in the example above). If the pattern matches and the condition is truthy, the body of the case clause runs normally. If the pattern matches but the condition is falsy, the match statement proceeds to check the next ``case`` clause as if the pattern hadn't matched (with the possible side-effect of having already bound some variables). The sequence of these steps must be considered carefully when combining or-patterns and guards. If you have ``case [x, 100] | [0, x] if x > 10`` and your subject is ``[0, 100]``, the clause will be skipped. This happens because: * The or-pattern finds the first alternative that matches the subject, which happens to be ``[x, 100]`` * ``x`` is bound to 0 * The condition x > 10 is checked. Given that it's false, the whole case clause is skipped. The ``[0, x]`` pattern is never attempted. Going to the cloud: Mappings ---------------------------- TODO: Give the motivating example of network requests, describe JSON based "protocol" TODO: partial matches, double stars Matching objects ---------------- UI events motivations. describe events in dataclasses. inspiration for event objects can be taken from https://www.pygame.org/docs/ref/event.html example of getting constants from module (like key names for keyboard events) customizing match_args? .. _Appendix A: Appendix A -- Quick Intro ========================= A ``match`` statement takes an expression and compares it to successive patterns given as one or more ``case`` blocks. This is superficially similar to a ``switch`` statement in C, Java or JavaScript (and many other languages), but much more powerful. The simplest form compares a subject value against one or more literals:: def http_error(status): match status: case 400: return "Bad request" case 401: return "Unauthorized" case 403: return "Forbidden" case 404: return "Not found" case 418: return "I'm a teapot" case _: return "Something's wrong with the Internet" Note the last block: the "variable name" ``_`` acts as a *wildcard* and never fails to match. You can combine several literals in a single pattern using ``|`` ("or"):: case 401 | 403 | 404: return "Not allowed" Patterns can look like unpacking assignments, and can be used to bind variables:: # The subject is an (x, y) tuple match point: case (0, 0): print("Origin") case (0, y): print(f"Y={y}") case (x, 0): print(f"X={x}") case (x, y): print(f"X={x}, Y={y}") case _: raise ValueError("Not a point") Study that one carefully! The first pattern has two literals, and can be thought of as an extension of the literal pattern shown above. But the next two patterns combine a literal and a variable, and the variable *captures* a value from the subject (``point``). The fourth pattern captures two values, which makes it conceptually similar to the unpacking assignment ``(x, y) = point``. If you are using classes to structure your data (e.g. data classes) you can use the class name followed by an argument list resembling a constructor, but with the ability to capture variables:: from dataclasses import dataclass @dataclass class Point: x: int y: int def whereis(point): match point: case Point(0, 0): print("Origin") case Point(0, y): print(f"Y={y}") case Point(x, 0): print(f"X={x}") case Point(): print("Somewhere else") case _: print("Not a point") We can use keyword parameters too. The following patterns are all equivalent (and all bind the ``y`` attribute to the ``var`` variable):: Point(1, var) Point(1, y=var) Point(x=1, y=var) Point(y=var, x=1) Patterns can be arbitrarily nested. For example, if we have a short list of points, we could match it like this:: match points: case []: print("No points") case [Point(0, 0)]: print("The origin") case [Point(x, y)]: print(f"Single point {x}, {y}") case [Point(0, y1), Point(0, y2)]: print(f"Two on the Y axis at {y1}, {y2}") case _: print("Something else") We can add an ``if`` clause to a pattern, known as a "guard". If the guard is false, ``match`` goes on to try the next ``case`` block. Note that value capture happens before the guard is evaluated:: match point: case Point(x, y) if x == y: print(f"Y=X at {x}") case Point(x, y): print(f"Not on the diagonal") Several other key features: - Like unpacking assignments, tuple and list patterns have exactly the same meaning and actually match arbitrary sequences. An important exception is that they don't match iterators or strings. (Technically, the subject must be an instance of ``collections.abc.Sequence``.) - Sequence patterns support wildcards: ``[x, y, *rest]`` and ``(x, y, *rest)`` work similar to wildcards in unpacking assignments. The name after ``*`` may also be ``_``, so ``(x, y, *_)`` matches a sequence of at least two items without binding the remaining items. - Mapping patterns: ``{"bandwidth": b, "latency": l}`` captures the ``"bandwidth"`` and ``"latency"`` values from a dict. Unlike sequence patterns, extra keys are ignored. A wildcard ``**rest`` is also supported. (But ``**_`` would be redundant, so it not allowed.) - Subpatterns may be captured using the walrus (``:=``) operator:: case (Point(x1, y1), p2 := Point(x2, y2)): ... - Patterns may use named constants. These must be dotted names to prevent them from being interpreted as capture variable:: from enum import Enum class Color(Enum): RED = 0 GREEN = 1 BLUE = 2 match color: case Color.RED: print("I see red!") case Color.GREEN: print("Grass is green") case Color.BLUE: print("I'm feeling the blues :(") - The literals ``None``, ``False`` and ``True`` are treated specially: comparisons to the subject are done using ``is``. This:: match b: case True: print("Yes!") is exactly equivalent to this:: if b is True: print("Yes!") - Classes may override the mapping from positional arguments to attributes by setting a class variable ``__match_args__``. Read about it in PEP 634. Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: