2346 lines
87 KiB
ReStructuredText
2346 lines
87 KiB
ReStructuredText
PEP: 622
|
||
Title: Structural Pattern Matching
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Brandt Bucher <brandtbucher@gmail.com>,
|
||
Daniel F Moisset <dfmoisset@gmail.com>,
|
||
Tobias Kohn <kohnt@tobiaskohn.ch>,
|
||
Ivan Levkivskyi <levkivskyi@gmail.com>,
|
||
Guido van Rossum <guido@python.org>,
|
||
Talin <viridia@gmail.com>
|
||
BDFL-Delegate:
|
||
Discussions-To: Python-Dev <python-dev@python.org>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 23-Jun-2020
|
||
Python-Version: 3.10
|
||
Post-History: 23-Jun-2020, 8-Jul-2020
|
||
Resolution:
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP proposes to add a **pattern matching statement** to Python,
|
||
inspired by similar syntax found in Scala, Erlang, and other languages.
|
||
|
||
Patterns and shapes
|
||
-------------------
|
||
|
||
The **pattern syntax** builds on Python’s existing syntax for sequence
|
||
unpacking (e.g., ``a, b = value``).
|
||
|
||
A ``match`` statement compares a value (the **subject**)
|
||
to several different shapes (the **patterns**) until a shape fits.
|
||
Each pattern describes the type and structure of the accepted values
|
||
as well as the variables where to capture its contents.
|
||
|
||
Patterns can specify the shape to be:
|
||
|
||
- a sequence to be unpacked, as already mentioned
|
||
- a mapping with specific keys
|
||
- an instance of a given class with (optionally) specific attributes
|
||
- a specific value
|
||
- a wildcard
|
||
|
||
Patterns can be composed in several ways.
|
||
|
||
Syntax
|
||
------
|
||
|
||
Syntactically, a ``match`` statement contains:
|
||
|
||
- a *subject* expression
|
||
- one or more ``case`` clauses
|
||
|
||
Each ``case`` clause specifies:
|
||
|
||
- a pattern (the overall shape to be matched)
|
||
- an optional “guard” (a condition to be checked if the pattern matches)
|
||
- a code block to be executed if the case clause is selected
|
||
|
||
Motivation
|
||
----------
|
||
|
||
The rest of the PEP:
|
||
|
||
- motivates why we believe pattern matching makes a good addition to Python
|
||
- explains our design choices
|
||
- contains a precise syntactic and runtime specification
|
||
- gives guidance for static type checkers (and one small addition to the ``typing`` module)
|
||
- discusses the main objections and alternatives that have been
|
||
brought up during extensive discussion of the proposal, both within
|
||
the group of authors and in the python-dev community
|
||
|
||
Finally, we discuss some possible extensions that might be considered
|
||
in the future, once the community has ample experience with the
|
||
currently proposed syntax and semantics.
|
||
|
||
.. _overview:
|
||
|
||
Overview
|
||
========
|
||
|
||
Patterns are a new syntactical category with their own rules
|
||
and special cases. Patterns mix input (given values) and output
|
||
(captured variables) in novel ways. They may take a little time to
|
||
use effectively. The authors have provided
|
||
a brief introduction to the basic concepts here. Note that this section
|
||
is not intended to be complete or entirely accurate.
|
||
|
||
Pattern, a new syntactic construct, and destructuring
|
||
-----------------------------------------------------
|
||
|
||
A new syntactic construct called **pattern** is introduced in this
|
||
PEP. Syntactically, patterns look like a subset of expressions.
|
||
The following are examples of patterns:
|
||
|
||
- ``[first, second, *rest]``
|
||
- ``Point2d(x, 0)``
|
||
- ``{"name": "Bruce", "age": age}``
|
||
- ``42``
|
||
|
||
The above expressions may look like examples of object construction
|
||
with a constructor which takes some values as parameters and
|
||
builds an object from those components.
|
||
|
||
When viewed as a pattern, the above patterns mean the inverse operation of
|
||
construction, which we call **destructuring**. **Destructuring** takes a subject value
|
||
and extracts its components.
|
||
|
||
The syntactic similarity between object construction and destructuring is
|
||
intentional. It also follows the existing
|
||
Pythonic style of contexts which makes assignment targets (write contexts) look
|
||
like expressions (read contexts).
|
||
|
||
Pattern matching never creates objects. This is in the same way that
|
||
``[a, b] = my_list`` doesn't create a
|
||
new ``[a, b]`` list, nor reads the values of ``a`` and ``b``.
|
||
|
||
|
||
Matching process
|
||
----------------
|
||
|
||
.. **Reword**
|
||
The intuition we are trying to build in users as they learn this is
|
||
that matching a pattern to a subject binds the free variables (if any)
|
||
to subject components in a way that reflects the original
|
||
subject when read as an expression.
|
||
|
||
During this matching process,
|
||
the structure of the pattern may not fit the subject, and matching *fails*.
|
||
|
||
For example, matching the pattern ``Point2d(x, 0)`` to the subject
|
||
``Point2d(3, 0)`` successfully matches. The match also **binds**
|
||
the pattern's free variable ``x`` to the subject's value ``3``.
|
||
|
||
As another example, if the subject is ``[3, 0]``, the match fails
|
||
because the subject's type ``list`` is not the pattern's ``Point2d``.
|
||
|
||
As a third example, if the subject is
|
||
``Point2d(3, 7)``, the match fails because the
|
||
subject's second coordinate ``7`` is not the same as the pattern's ``0``.
|
||
|
||
The ``match`` statement tries to match a single subject to each of the
|
||
patterns in its ``case`` clauses. At the first
|
||
successful match to a pattern in a ``case`` clause:
|
||
|
||
- the variables in the pattern are assigned, and
|
||
- a corresponding block is executed.
|
||
|
||
Each ``case`` clause can also specify an optional boolean condition,
|
||
known as a **guard**.
|
||
|
||
Let's look at a more detailed example of a ``match`` statement. The
|
||
``match`` statement is used within a function to define the building
|
||
of 3D points. In this example, the function can accept as input any of
|
||
the following: tuple with 2 elements, tuple with 3 elements, an
|
||
existing Point2d object or an existing Point3d object::
|
||
|
||
def make_point_3d(pt):
|
||
match pt:
|
||
case (x, y):
|
||
return Point3d(x, y, 0)
|
||
case (x, y, z):
|
||
return Point3d(x, y, z)
|
||
case Point2d(x, y):
|
||
return Point3d(x, y, 0)
|
||
case Point3d(_, _, _):
|
||
return pt
|
||
case _:
|
||
raise TypeError("not a point we support")
|
||
|
||
Without pattern matching, this function's implementation would require several
|
||
``isinstance()`` checks, one or two ``len()`` calls, and a more
|
||
convoluted control flow. The ``match`` example version and the traditional
|
||
Python version without ``match`` translate into similar code under the hood.
|
||
With familiarity of pattern matching, a user reading this function using ``match``
|
||
will likely find this version clearer than the traditional approach.
|
||
|
||
|
||
Rationale and Goals
|
||
===================
|
||
|
||
Python programs frequently need to handle data which varies in type,
|
||
presence of attributes/keys, or number of elements. Typical examples
|
||
are operating on nodes of a mixed structure like an AST, handling UI
|
||
events of different types, processing structured input (like
|
||
structured files or network messages), or “parsing” arguments for a
|
||
function that can accept different combinations of types and numbers
|
||
of parameters. In fact, the classic 'visitor' pattern is an example of this,
|
||
done in an OOP style -- but matching makes it much less tedious to write.
|
||
|
||
Much of the code to do so tends to consist of complex chains of nested
|
||
``if``/``elif`` statements, including multiple calls to ``len()``,
|
||
``isinstance()`` and index/key/attribute access. Inside those branches
|
||
users sometimes need to destructure the data further to extract the
|
||
required component values, which may be nested several objects deep.
|
||
|
||
Pattern matching as present in many other languages provides an
|
||
elegant solution to this problem. These range from statically compiled
|
||
functional languages like F# and Haskell, via mixed-paradigm languages
|
||
like Scala [4]_ and Rust [3]_, to dynamic languages like Elixir and
|
||
Ruby, and is under consideration for JavaScript. We are indebted to
|
||
these languages for guiding the way to Pythonic pattern matching, as
|
||
Python is indebted to so many other languages for many of its
|
||
features: many basic syntactic features were inherited from C,
|
||
exceptions from Modula-3, classes were inspired by C++, slicing came
|
||
from Icon, regular expressions from Perl, decorators resemble Java
|
||
annotations, and so on.
|
||
|
||
The usual logic for operating on heterogeneous data can be summarized
|
||
in the following way:
|
||
|
||
- Some analysis is done on the *shape* (type and components) of the
|
||
data: This could involve ``isinstance()`` or ``len()`` calls and/or extracting
|
||
components (via indexing or attribute access) which are checked for
|
||
specific values or conditions.
|
||
- If the shape is as expected, some more components are possibly
|
||
extracted and some operation is done using the extracted values.
|
||
|
||
Take for example `this piece of the Django web framework
|
||
<https://github.com/django/django/blob/5166097d7c80cab757e44f2d02f3d148fbbc2ff6/django/db/models/enums.py#L13>`_::
|
||
|
||
if (
|
||
isinstance(value, (list, tuple)) and
|
||
len(value) > 1 and
|
||
isinstance(value[-1], (Promise, str))
|
||
):
|
||
*value, label = value
|
||
value = tuple(value)
|
||
else:
|
||
label = key.replace('_', ' ').title()
|
||
|
||
We can see the shape analysis of the ``value`` at the top, following
|
||
by the destructuring inside.
|
||
|
||
Note that shape analysis here involves checking the types both of the
|
||
container and of one of its components, and some checks on its number
|
||
of elements. Once we match the shape, we need to decompose the
|
||
sequence. With the proposal in this PEP, we could rewrite that code
|
||
into this::
|
||
|
||
match value:
|
||
case [*v, label := (Promise() | str())] if v:
|
||
value = tuple(v)
|
||
case _:
|
||
label = key.replace('_', ' ').title()
|
||
|
||
This syntax makes much more explicit which formats are possible for
|
||
the input data, and which components are extracted from where. You can
|
||
see a pattern similar to list unpacking, but also type checking: the
|
||
``Promise()`` pattern is not an object construction, but represents
|
||
anything that's an instance of ``Promise``. The pattern operator ``|``
|
||
separates alternative patterns (not unlike regular expressions or EBNF
|
||
grammars), and ``_`` is a wildcard. (Note that the match syntax used
|
||
here will accept user-defined sequences, as well as lists and tuples.)
|
||
|
||
In some occasions, extraction of information is not as relevant as
|
||
identifying structure. Take the following example from the
|
||
`Python standard library
|
||
<https://github.com/python/cpython/blob/c4cacc8/Lib/lib2to3/fixer_util.py#L158>`_::
|
||
|
||
def is_tuple(node):
|
||
if isinstance(node, Node) and node.children == [LParen(), RParen()]:
|
||
return True
|
||
return (isinstance(node, Node)
|
||
and len(node.children) == 3
|
||
and isinstance(node.children[0], Leaf)
|
||
and isinstance(node.children[1], Node)
|
||
and isinstance(node.children[2], Leaf)
|
||
and node.children[0].value == "("
|
||
and node.children[2].value == ")")
|
||
|
||
This example shows an example of finding out the "shape" of the data
|
||
without doing significant extraction. This code is not very easy to
|
||
read, and the intended shape that this is trying to match is not
|
||
evident. Compare with the updated code using the proposed syntax::
|
||
|
||
def is_tuple(node: Node) -> bool:
|
||
match node:
|
||
case Node(children=[LParen(), RParen()]):
|
||
return True
|
||
case Node(children=[Leaf(value="("), Node(), Leaf(value=")")]):
|
||
return True
|
||
case _:
|
||
return False
|
||
|
||
Note that the proposed code will work without any modifications to the
|
||
definition of ``Node`` and other classes here. As shown in the
|
||
examples above, the proposal supports not just unpacking sequences, but
|
||
also doing ``isinstance`` checks (like ``LParen()`` or ``str()``),
|
||
looking into object attributes (``Leaf(value="(")`` for example) and
|
||
comparisons with literals.
|
||
|
||
That last feature helps with some kinds of code which look more like
|
||
the "switch" statement as present in other languages::
|
||
|
||
match response.status:
|
||
case 200:
|
||
do_something(response.data) # OK
|
||
case 301 | 302:
|
||
retry(response.location) # Redirect
|
||
case 401:
|
||
retry(auth=get_credentials()) # Login first
|
||
case 426:
|
||
sleep(DELAY) # Server is swamped, try after a bit
|
||
retry()
|
||
case _:
|
||
raise RequestError("we couldn't get the data")
|
||
|
||
Although this will work, it's not necessarily what the proposal is
|
||
focused on, and the new syntax has been designed to best support the
|
||
destructuring scenarios.
|
||
|
||
See the `syntax`_ sections below for a more detailed specification.
|
||
|
||
We propose that destructuring objects can be customized by a new
|
||
special ``__match_args__`` attribute. As part of this PEP we specify
|
||
the general API and its implementation for some standard library
|
||
classes (including named tuples and dataclasses). See the `runtime`_
|
||
section below.
|
||
|
||
Finally, we aim to provide comprehensive support for static type
|
||
checkers and similar tools. For this purpose, we propose to introduce
|
||
a ``@typing.sealed`` class decorator that will be a no-op at runtime
|
||
but will indicate to static tools that all sub-classes of this class
|
||
must be defined in the same module. This will allow effective static
|
||
exhaustiveness checks, and together with dataclasses, will provide
|
||
basic support for algebraic data types [2]_. See the `static checkers`_
|
||
section for more details.
|
||
|
||
|
||
.. _syntax:
|
||
|
||
Syntax and Semantics
|
||
====================
|
||
|
||
Patterns
|
||
--------
|
||
|
||
The **pattern** is a new syntactic construct, that could be considered a loose
|
||
generalization of assignment targets. The key properties of a pattern are what
|
||
types and shapes of subjects it accepts, what variables it captures and how
|
||
it extracts them from the subject. For example the pattern ``[a, b]`` matches
|
||
only sequences of exactly 2 elements, extracting the first element into ``a``
|
||
and the second one into ``b``.
|
||
|
||
This PEP defines several types of patterns. These are certainly not the
|
||
only possible ones, so the design decision was made to choose a subset of
|
||
functionality that is useful now but conservative. More patterns can be added
|
||
later as this feature gets more widespread use. See the `rejected ideas`_
|
||
and `deferred ideas`_ sections for more details.
|
||
|
||
The patterns listed here are described in more detail below, but summarized
|
||
together in this section for simplicity:
|
||
|
||
- A **literal pattern** is useful to filter constant values in a structure.
|
||
It looks like a Python literal (including some values like ``True``,
|
||
``False`` and ``None``). It only matches objects equal to the literal, and
|
||
never binds.
|
||
- A **capture pattern** looks like ``x`` and is equivalent to an identical
|
||
assignment target: it always matches and binds the variable
|
||
with the given (simple) name.
|
||
- The **wildcard pattern** is a single underscore: ``_``. It always matches,
|
||
but does not capture any variable (which prevents interference with other
|
||
uses for ``_`` and allows for some optimizations).
|
||
- A **constant value pattern** works like the literal but for certain named
|
||
constants. Note that it must be a qualified (dotted) name, given the possible
|
||
ambiguity with a capture pattern. It looks like ``Color.RED`` and
|
||
only matches values equal to the corresponding value. It never binds.
|
||
- A **sequence pattern** looks like ``[a, *rest, b]`` and is similar to
|
||
a list unpacking. An important difference is that the elements nested
|
||
within it can be any kind of patterns, not just names or sequences.
|
||
It matches only sequences of appropriate length, as long as all the sub-patterns
|
||
also match. It makes all the bindings of its sub-patterns.
|
||
- A **mapping pattern** looks like ``{"user": u, "emails": [*es]}``. It matches
|
||
mappings with at least the set of provided keys, and if all the
|
||
sub-patterns match their corresponding values. It binds whatever the
|
||
sub-patterns bind while matching with the values corresponding to the keys.
|
||
Adding ``**rest`` at the end of the pattern to capture extra items is allowed.
|
||
- A **class pattern** is similar to the above but matches attributes instead
|
||
of keys. It looks like ``datetime.date(year=y, day=d)``. It matches
|
||
instances of the given type, having at least the specified
|
||
attributes, as long as the attributes match with the corresponding
|
||
sub-patterns. It binds whatever the sub-patterns bind when matching with the
|
||
values of
|
||
the given attributes. An optional protocol also allows matching positional
|
||
arguments.
|
||
- An **OR pattern** looks like ``[*x] | {"elems": [*x]}``. It matches if any
|
||
of its sub-patterns match. It uses the binding for the leftmost pattern
|
||
that matched.
|
||
- A **walrus pattern** looks like ``d := datetime(year=2020, month=m)``. It
|
||
matches only
|
||
if its sub-pattern also matches. It binds whatever the sub-pattern match does, and
|
||
also binds the named variable to the entire object.
|
||
|
||
The ``match`` statement
|
||
-----------------------
|
||
|
||
A simplified, approximate grammar for the proposed syntax is::
|
||
|
||
...
|
||
compound_statement:
|
||
| if_stmt
|
||
...
|
||
| match_stmt
|
||
match_stmt: "match" expression ':' NEWLINE INDENT case_block+ DEDENT
|
||
case_block: "case" pattern [guard] ':' block
|
||
guard: 'if' expression
|
||
pattern: walrus_pattern | or_pattern
|
||
walrus_pattern: NAME ':=' or_pattern
|
||
or_pattern: closed_pattern ('|' closed_pattern)*
|
||
closed_pattern:
|
||
| literal_pattern
|
||
| capture_pattern
|
||
| wildcard_pattern
|
||
| constant_pattern
|
||
| sequence_pattern
|
||
| mapping_pattern
|
||
| class_pattern
|
||
|
||
See `Appendix A`_ for the full, unabridged grammar. The simplified grammars in
|
||
this section are there for helping the reader, not as a full specification.
|
||
|
||
We propose that the match operation should be a statement, not an expression.
|
||
Although in
|
||
many languages it is an expression, being a statement better suits the general
|
||
logic of Python syntax. See `rejected ideas`_ for more discussion.
|
||
The allowed patterns are described in detail below in the `patterns`_
|
||
subsection.
|
||
|
||
The ``match`` and ``case`` keywords are proposed to be soft keywords,
|
||
so that they are recognized as keywords at the beginning of a match
|
||
statement or case block respectively, but are allowed to be used in
|
||
other places as variable or argument names.
|
||
|
||
The proposed indentation structure is as following::
|
||
|
||
match some_expression:
|
||
case pattern_1:
|
||
...
|
||
case pattern_2:
|
||
...
|
||
|
||
Here, ``some_expression`` represents the value that is being matched against,
|
||
which will be referred to hereafter as the *subject* of the match.
|
||
|
||
|
||
Match semantics
|
||
---------------
|
||
|
||
The proposed large scale semantics for choosing the match is to choose the first
|
||
matching pattern and execute the corresponding suite. The remaining patterns
|
||
are not tried. If there are no matching patterns, the statement 'falls
|
||
through', and execution continues at the following statement.
|
||
|
||
Essentially this is equivalent to a chain of ``if ... elif ... else``
|
||
statements. Note that unlike for the previously proposed ``switch`` statement,
|
||
the pre-computed dispatch dictionary semantics does not apply here.
|
||
|
||
There is no ``default`` or ``else`` case - instead the special wildcard
|
||
``_`` can be used (see the section on `capture_pattern`_) as a final
|
||
'catch-all' pattern.
|
||
|
||
Name bindings made during a successful pattern match outlive the executed suite
|
||
and can be used after the match statement. This follows the logic of other
|
||
Python statements that can bind names, such as ``for`` loop and ``with``
|
||
statement. For example::
|
||
|
||
match shape:
|
||
case Point(x, y):
|
||
...
|
||
case Rectangle(x, y, _, _):
|
||
...
|
||
print(x, y) # This works
|
||
|
||
During failed pattern matches, some sub-patterns may succeed. For example,
|
||
while matching the value ``[0, 1, 2]`` with the pattern ``(0, x, 1)``, the
|
||
sub-pattern ``x`` may succeed if the list elements are matched from left to right.
|
||
The implementation may choose to either make persistent bindings for those
|
||
partial matches or not. User code including a ``match`` statement should not rely
|
||
on the bindings being made for a failed match, but also shouldn't assume that
|
||
variables are unchanged by a failed match. This part of the behavior is
|
||
left intentionally unspecified so different implementations can add
|
||
optimizations, and to prevent introducing semantic restrictions that could
|
||
limit the extensibility of this feature.
|
||
|
||
Note that some pattern types below define more specific rules about when
|
||
the binding is made.
|
||
|
||
.. _patterns:
|
||
|
||
Allowed patterns
|
||
----------------
|
||
|
||
We introduce the proposed syntax gradually. Here we start from the main
|
||
building blocks. The following patterns are supported:
|
||
|
||
|
||
.. _literal_pattern:
|
||
|
||
Literal Patterns
|
||
~~~~~~~~~~~~~~~~
|
||
|
||
Simplified syntax::
|
||
|
||
literal_pattern:
|
||
| number
|
||
| string
|
||
| 'None'
|
||
| 'True'
|
||
| 'False'
|
||
|
||
|
||
A literal pattern consists of a simple literal like a string, a number,
|
||
a Boolean literal (``True`` or ``False``), or ``None``::
|
||
|
||
match number:
|
||
case 0:
|
||
print("Nothing")
|
||
case 1:
|
||
print("Just one")
|
||
case 2:
|
||
print("A couple")
|
||
case -1:
|
||
print("One less than nothing")
|
||
case 1-1j:
|
||
print("Good luck with that...")
|
||
|
||
Literal pattern uses equality with literal on the right hand side, so that
|
||
in the above example ``number == 0`` and then possibly ``number == 1``, etc
|
||
will be evaluated. Note that although technically negative numbers
|
||
are represented using unary minus, they are considered
|
||
literals for the purpose of pattern matching. Unary plus is not allowed.
|
||
Binary plus and minus are allowed only to join a real number and an imaginary
|
||
number to form a complex number, such as ``1+1j``.
|
||
|
||
Note that because equality (``__eq__``) is used, and the equivalency
|
||
between Booleans and the integers ``0`` and ``1``, there is no
|
||
practical difference between the following two::
|
||
|
||
case True:
|
||
...
|
||
|
||
case 1:
|
||
...
|
||
|
||
Triple-quoted strings are supported. Raw strings and byte strings
|
||
are supported. F-strings are not allowed (since in general they are not
|
||
really literals).
|
||
|
||
|
||
.. _capture_pattern:
|
||
|
||
Capture Patterns
|
||
~~~~~~~~~~~~~~~~
|
||
|
||
Simplified syntax::
|
||
|
||
capture_pattern: NAME
|
||
|
||
A capture pattern serves as an assignment target for the matched expression::
|
||
|
||
match greeting:
|
||
case "":
|
||
print("Hello!")
|
||
case name:
|
||
print(f"Hi {name}!")
|
||
|
||
Only a single name is allowed (a dotted name is a constant value pattern).
|
||
A capture pattern always succeeds. A capture pattern appearing in a scope makes
|
||
the name local to that scope. For example, using ``name`` after the above
|
||
snippet may raise ``UnboundLocalError`` rather than ``NameError``, if
|
||
the ``""`` case clause was taken::
|
||
|
||
match greeting:
|
||
case "":
|
||
print("Hello!")
|
||
case name:
|
||
print(f"Hi {name}!")
|
||
if name == "Santa": # <-- might raise UnboundLocalError
|
||
... # but works fine if greeting was not empty
|
||
|
||
While matching against each case clause, a name may be bound at most
|
||
once, having two capture patterns with coinciding names is an error::
|
||
|
||
match data:
|
||
case [x, x]: # Error!
|
||
...
|
||
|
||
Note: one can still match on a collection with equal items using `guards`_.
|
||
Also, ``[x, y] | Point(x, y)`` is a legal pattern because the two
|
||
alternatives are never matched at the same time.
|
||
|
||
The single underscore (``_``) is not considered a ``NAME`` and treated specially
|
||
as a `wildcard pattern`_.
|
||
|
||
Reminder: ``None``, ``False`` and ``True`` are keywords denoting
|
||
literals, not names.
|
||
|
||
.. _wildcard_pattern:
|
||
|
||
Wildcard Pattern
|
||
~~~~~~~~~~~~~~~~
|
||
|
||
Simplified syntax::
|
||
|
||
wildcard_pattern: "_"
|
||
|
||
The single underscore (``_``) name is a special kind of pattern that always
|
||
matches but *never* binds::
|
||
|
||
match data:
|
||
case [_, _]:
|
||
print("Some pair")
|
||
print(_) # Error!
|
||
|
||
Given that no binding is made, it can be used as many times as desired, unlike
|
||
capture patterns.
|
||
|
||
.. _constant_value_pattern:
|
||
|
||
Constant Value Patterns
|
||
~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Simplified syntax::
|
||
|
||
constant_pattern: NAME ('.' NAME)+
|
||
|
||
This is used to match against constants and enum values.
|
||
Every dotted name in a pattern is looked up using normal Python name
|
||
resolution rules, and the value is used for comparison by equality with
|
||
the match subject (same as for literals)::
|
||
|
||
from enum import Enum
|
||
|
||
class Sides(str, Enum):
|
||
SPAM = "Spam"
|
||
EGGS = "eggs"
|
||
...
|
||
|
||
match entree[-1]:
|
||
case Sides.SPAM: # Compares entree[-1] == Sides.SPAM.
|
||
response = "Have you got anything without Spam?"
|
||
case side: # Assigns side = entree[-1].
|
||
response = f"Well, could I have their Spam instead of the {side} then?"
|
||
|
||
Note that there is no way to use unqualified names as constant value
|
||
patterns (they always denote variables to be captured). See
|
||
`rejected ideas`_ for other syntactic alternatives that were
|
||
considered for constant value patterns.
|
||
|
||
|
||
.. _sequence_pattern:
|
||
|
||
Sequence Patterns
|
||
~~~~~~~~~~~~~~~~~
|
||
|
||
Simplified syntax::
|
||
|
||
sequence_pattern:
|
||
| '[' [values_pattern] ']'
|
||
| '(' [value_pattern ',' [values pattern]] ')'
|
||
values_pattern: ','.value_pattern+ ','?
|
||
value_pattern: '*' capture_pattern | pattern
|
||
|
||
A sequence pattern follows the same semantics as unpacking assignment.
|
||
Like unpacking assignment, both tuple-like and list-like syntax can be
|
||
used, with identical semantics. Each element can be an arbitrary
|
||
pattern; there may also be at most one ``*name`` pattern to catch all
|
||
remaining items::
|
||
|
||
match collection:
|
||
case 1, [x, *others]:
|
||
print("Got 1 and a nested sequence")
|
||
case (1, x):
|
||
print(f"Got 1 and {x}")
|
||
|
||
To match a sequence pattern the subject must be an instance of
|
||
``collections.abc.Sequence``, and it cannot be any kind of string
|
||
(``str``, ``bytes``, ``bytearray``). It cannot be an iterator. For matching
|
||
on a specific collection class, see class pattern below.
|
||
|
||
The ``_`` wildcard can be starred to match sequences of varying lengths. For
|
||
example:
|
||
|
||
* ``[*_]`` matches a sequence of any length.
|
||
* ``(_, _, *_)``, matches any sequence of length two or more.
|
||
* ``["a", *_, "z"]`` matches any sequence of length two or more that starts with
|
||
``"a"`` and ends with ``"z"``.
|
||
|
||
|
||
.. _mapping_pattern:
|
||
|
||
Mapping Patterns
|
||
~~~~~~~~~~~~~~~~
|
||
|
||
Simplified syntax::
|
||
|
||
mapping_pattern: '{' [items_pattern] '}'
|
||
items_pattern: ','.key_value_pattern+ ','?
|
||
key_value_pattern:
|
||
| (literal_pattern | constant_pattern) ':' or_pattern
|
||
| '**' capture_pattern
|
||
|
||
|
||
Mapping pattern is a generalization of iterable unpacking to mappings.
|
||
Its syntax is similar to dictionary display but each key and value are
|
||
patterns ``"{" (pattern ":" pattern)+ "}"``. A ``**rest`` pattern is also
|
||
allowed, to extract the remaining items. Only literal and constant value
|
||
patterns are allowed in key positions::
|
||
|
||
import constants
|
||
|
||
match config:
|
||
case {"route": route}:
|
||
process_route(route)
|
||
case {constants.DEFAULT_PORT: sub_config, **rest}:
|
||
process_config(sub_config, rest)
|
||
|
||
The subject must be an instance of ``collections.abc.Mapping``.
|
||
Extra keys in the subject are ignored even if ``**rest`` is not present.
|
||
This is different from sequence pattern, where extra items will cause a
|
||
match to fail. But mappings are actually different from sequences: they
|
||
have natural structural sub-typing behavior, i.e., passing a dictionary
|
||
with extra keys somewhere will likely just work.
|
||
|
||
For this reason, ``**_`` is invalid in mapping patterns; it would always be a
|
||
no-op that could be removed without consequence.
|
||
|
||
Matched key-value pairs must already be present in the mapping, and not created
|
||
on-the-fly by ``__missing__`` or ``__getitem__``. For example,
|
||
``collections.defaultdict`` instances will only match patterns with keys that
|
||
were already present when the ``match`` block was entered.
|
||
|
||
|
||
.. _class_pattern:
|
||
|
||
Class Patterns
|
||
~~~~~~~~~~~~~~
|
||
|
||
Simplified syntax::
|
||
|
||
class_pattern:
|
||
| name_or_attr '(' ')'
|
||
| name_or_attr '(' ','.pattern+ ','? ')'
|
||
| name_or_attr '(' ','.keyword_pattern+ ','? ')'
|
||
| name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'
|
||
keyword_pattern: NAME '=' or_pattern
|
||
|
||
|
||
A class pattern provides support for destructuring arbitrary objects.
|
||
There are two possible ways of matching on object attributes: by position
|
||
like ``Point(1, 2)``, and by name like ``Point(x=1, y=2)``. These
|
||
two can be combined, but a positional match cannot follow a match by name.
|
||
Each item in a class pattern can be an arbitrary pattern. A simple
|
||
example::
|
||
|
||
match shape:
|
||
case Point(x, y):
|
||
...
|
||
case Rectangle(x0, y0, x1, y1, painted=True):
|
||
...
|
||
|
||
Whether a match succeeds or not is determined by the equivalent of an
|
||
``isinstance`` call. If the subject (``shape``, in the example) is not
|
||
an instance of the named class (``Point`` or ``Rectangle``), the match
|
||
fails. Otherwise, it continues (see details in the `runtime`_
|
||
section).
|
||
|
||
The named class must inherit from ``type``. It may be a single name
|
||
or a dotted name (e.g. ``some_mod.SomeClass`` or ``mod.pkg.Class``).
|
||
The leading name must not be ``_``, so e.g. ``_(...)`` and
|
||
``_.C(...)`` are invalid. Use ``object(foo=_)`` to check whether the
|
||
matched object has an attribute ``foo``.
|
||
|
||
By default, sub-patterns may only be matched by keyword for
|
||
user-defined classes. In order to support positional sub-patterns, a
|
||
custom ``__match_args__`` attribute is required.
|
||
The runtime allows matching against
|
||
arbitrarily nested patterns by chaining all of the instance checks and
|
||
attribute lookups appropriately.
|
||
|
||
|
||
Combining multiple patterns (OR patterns)
|
||
-----------------------------------------
|
||
|
||
Multiple alternative patterns can be combined into one using ``|``. This means
|
||
the whole pattern matches if at least one alternative matches.
|
||
Alternatives are tried from left to right and have a short-circuit property,
|
||
subsequent patterns are not tried if one matched. Examples::
|
||
|
||
match something:
|
||
case 0 | 1 | 2:
|
||
print("Small number")
|
||
case [] | [_]:
|
||
print("A short sequence")
|
||
case str() | bytes():
|
||
print("Something string-like")
|
||
case _:
|
||
print("Something else")
|
||
|
||
The alternatives may bind variables, as long as each alternative binds
|
||
the same set of variables (excluding ``_``). For example::
|
||
|
||
match something:
|
||
case 1 | x: # Error!
|
||
...
|
||
case x | 1: # Error!
|
||
...
|
||
case one := [1] | two := [2]: # Error!
|
||
...
|
||
case Foo(arg=x) | Bar(arg=x): # Valid, both arms bind 'x'
|
||
...
|
||
case [x] | x: # Valid, both arms bind 'x'
|
||
...
|
||
|
||
|
||
.. _guards:
|
||
|
||
Guards
|
||
------
|
||
|
||
Each *top-level* pattern can be followed by a **guard** of the form
|
||
``if expression``. A case clause succeeds if the pattern matches and the guard
|
||
evaluates to a true value. For example::
|
||
|
||
match input:
|
||
case [x, y] if x > MAX_INT and y > MAX_INT:
|
||
print("Got a pair of large numbers")
|
||
case x if x > MAX_INT:
|
||
print("Got a large number")
|
||
case [x, y] if x == y:
|
||
print("Got equal items")
|
||
case _:
|
||
print("Not an outstanding input")
|
||
|
||
If evaluating a guard raises an exception, it is propagated onwards rather
|
||
than fail the case clause. Names that appear in a pattern are bound before the
|
||
guard succeeds. So this will work::
|
||
|
||
values = [0]
|
||
|
||
match values:
|
||
case [x] if x:
|
||
... # This is not executed
|
||
case _:
|
||
...
|
||
print(x) # This will print "0"
|
||
|
||
Note that guards are not allowed for nested patterns, so that ``[x if x > 0]``
|
||
is a ``SyntaxError`` and ``1 | 2 if 3 | 4`` will be parsed as
|
||
``(1 | 2) if (3 | 4)``.
|
||
|
||
|
||
Walrus patterns
|
||
---------------
|
||
|
||
It is often useful to match a sub-pattern *and* bind the corresponding
|
||
value to a name. For example, it can be useful to write more efficient
|
||
matches, or simply to avoid repetition. To simplify such cases, any pattern
|
||
(other than the walrus pattern itself) can be preceded by a name and
|
||
the walrus operator (``:=``). For example::
|
||
|
||
match get_shape():
|
||
case Line(start := Point(x, y), end) if start == end:
|
||
print(f"Zero length line at {x}, {y}")
|
||
|
||
The name on the left of the walrus operator can be used in a guard, in
|
||
the match suite, or after the match statement. However, the name will
|
||
*only* be bound if the sub-pattern succeeds. Another example::
|
||
|
||
match group_shapes():
|
||
case [], [point := Point(x, y), *other]:
|
||
print(f"Got {point} in the second group")
|
||
process_coordinates(x, y)
|
||
...
|
||
|
||
Technically, most such examples can be rewritten using guards and/or nested
|
||
match statements, but this will be less readable and/or will produce less
|
||
efficient code. Essentially, most of the arguments in PEP 572 apply here
|
||
equally.
|
||
|
||
The wildcard ``_`` is not a valid name here.
|
||
|
||
|
||
.. _runtime:
|
||
|
||
Runtime specification
|
||
=====================
|
||
|
||
The Match Protocol
|
||
------------------
|
||
|
||
The equivalent of an ``isinstance`` call is used to decide whether an
|
||
object matches a given class pattern and to extract the corresponding
|
||
attributes. Classes requiring different matching semantics (such as
|
||
duck-typing) can do so by defining ``__instancecheck__`` (a
|
||
pre-existing metaclass hook) or by using ``typing.Protocol``.
|
||
|
||
The procedure is as following:
|
||
|
||
* The class object for ``Class`` in ``Class(<sub-patterns>)`` is
|
||
looked up and ``isinstance(obj, Class)`` is called, where ``obj`` is
|
||
the value being matched. If false, the match fails.
|
||
|
||
* Otherwise, if any sub-patterns are given in the form of positional
|
||
or keyword arguments, these are matched from left to right, as
|
||
follows. The match fails as soon as a sub-pattern fails; if all
|
||
sub-patterns succeed, the overall class pattern match succeeds.
|
||
|
||
* If there are match-by-position items and the class has a
|
||
``__match_args__`` attribute, the item at position ``i``
|
||
is matched against the value looked up by attribute
|
||
``__match_args__[i]``. For example, a pattern ``Point2d(5, 8)``,
|
||
where ``Point2d.__match_args__ == ["x", "y"]``, is translated
|
||
(approximately) into ``obj.x == 5 and obj.y == 8``.
|
||
|
||
* If there are more positional items than the length of
|
||
``__match_args__``, a ``TypeError`` is raised.
|
||
|
||
* If the ``__match_args__`` attribute is absent on the matched class,
|
||
and one or more positional item appears in a match,
|
||
``TypeError`` is also raised. We don't fall back on
|
||
using ``__slots__`` or ``__annotations__`` -- "In the face of ambiguity,
|
||
refuse the temptation to guess."
|
||
|
||
* If there are any match-by-keyword items the keywords are looked up
|
||
as attributes on the subject. If the lookup succeeds the value is
|
||
matched against the corresponding sub-pattern. If the lookup fails,
|
||
the match fails.
|
||
|
||
Such a protocol favors simplicity of implementation over flexibility and
|
||
performance. For other considered alternatives, see `extended matching`_.
|
||
|
||
For the most commonly-matched built-in types (``bool``,
|
||
``bytearray``, ``bytes``, ``dict``, ``float``,
|
||
``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``), a
|
||
single positional sub-pattern is allowed to be passed to
|
||
the call. Rather than being matched against any particular attribute
|
||
on the subject, it is instead matched against the subject itself. This
|
||
creates behavior that is useful and intuitive for these objects:
|
||
|
||
* ``bool(False)`` matches ``False`` (but not ``0``).
|
||
* ``tuple((0, 1, 2))`` matches ``(0, 1, 2)`` (but not ``[0, 1, 2]``).
|
||
* ``int(i)`` matches any ``int`` and binds it to the name ``i``.
|
||
|
||
|
||
Overlapping sub-patterns
|
||
------------------------
|
||
|
||
Certain classes of overlapping matches are detected at
|
||
runtime and will raise exceptions. In addition to basic checks
|
||
described in the previous subsection:
|
||
|
||
* The interpreter will check that two match items are not targeting the same
|
||
attribute, for example ``Point2d(1, 2, y=3)`` is an error.
|
||
|
||
* It will also check that a mapping pattern does not attempt to match
|
||
the same key more than once.
|
||
|
||
|
||
Special attribute ``__match_args__``
|
||
------------------------------------
|
||
|
||
The ``__match_args__`` attribute is always looked up on the type
|
||
object named in the pattern. If present, it must be a list or tuple
|
||
of strings naming the allowed positional arguments.
|
||
|
||
In deciding what names should be available for matching, the
|
||
recommended practice is that class patterns should be the mirror of
|
||
construction; that is, the set of available names and their types
|
||
should resemble the arguments to ``__init__()``.
|
||
|
||
Only match-by-name will work by default, and classes should define
|
||
``__match_args__`` as a class attribute if they would like to support
|
||
match-by-position. Additionally, dataclasses and named tuples will
|
||
support match-by-position out of the box. See below for more details.
|
||
|
||
Exceptions and side effects
|
||
---------------------------
|
||
|
||
While matching each case, the ``match`` statement may trigger execution of other
|
||
functions (for example ``__getitem__()``, ``__len__()`` or
|
||
a property). Almost every exception caused by those propagates outside of the
|
||
match statement normally. The only case where an exception is not propagated is
|
||
an ``AttributeError`` raised while trying to lookup an attribute while matching
|
||
attributes of a Class Pattern; that case results in just a matching failure,
|
||
and the rest of the statement proceeds normally.
|
||
|
||
The only side-effect carried on explicitly by the matching process is the binding of
|
||
names. However, the process relies on attribute access,
|
||
instance checks, ``len()``, equality and item access on the subject and some of
|
||
its components. It also evaluates constant value patterns and the left side of
|
||
class patterns. While none of those typically create any side-effects, some of
|
||
these objects could. This proposal intentionally leaves out any specification
|
||
of what methods are called or how many times. User code relying on that
|
||
behavior should be considered buggy.
|
||
|
||
The standard library
|
||
--------------------
|
||
|
||
To facilitate the use of pattern matching, several changes will be made to
|
||
the standard library:
|
||
|
||
* Namedtuples and dataclasses will have auto-generated ``__match_args__``.
|
||
|
||
* For dataclasses the order of attributes in the generated ``__match_args__``
|
||
will be the same as the order of corresponding arguments in the generated
|
||
``__init__()`` method. This includes the situations where attributes are
|
||
inherited from a superclass.
|
||
|
||
In addition, a systematic effort will be put into going through
|
||
existing standard library classes and adding ``__match_args__`` where
|
||
it looks beneficial.
|
||
|
||
|
||
.. _static checkers:
|
||
|
||
Static checkers specification
|
||
=============================
|
||
|
||
Exhaustiveness checks
|
||
---------------------
|
||
|
||
From a reliability perspective, experience shows that missing a case when
|
||
dealing with a set of possible data values leads to hard to debug issues,
|
||
thus forcing people to add safety asserts like this::
|
||
|
||
def get_first(data: Union[int, list[int]]) -> int:
|
||
if isinstance(data, list) and data:
|
||
return data[0]
|
||
elif isinstance(data, int):
|
||
return data
|
||
else:
|
||
assert False, "should never get here"
|
||
|
||
PEP 484 specifies that static type checkers should support exhaustiveness in
|
||
conditional checks with respect to enum values. PEP 586 later generalized this
|
||
requirement to literal types.
|
||
|
||
This PEP further generalizes this requirement to
|
||
arbitrary patterns. A typical situation where this applies is matching an
|
||
expression with a union type::
|
||
|
||
def classify(val: Union[int, Tuple[int, int], List[int]]) -> str:
|
||
match val:
|
||
case [x, y] if x > 0 and y > 0:
|
||
return f"A pair of {x} and {y}"
|
||
case [x, *other]:
|
||
return f"A sequence starting with {x}"
|
||
case int():
|
||
return f"Some integer"
|
||
# Type-checking error: some cases unhandled.
|
||
|
||
The exhaustiveness checks should also apply where both pattern matching
|
||
and enum values are combined::
|
||
|
||
from enum import Enum
|
||
from typing import Union
|
||
|
||
class Level(Enum):
|
||
BASIC = 1
|
||
ADVANCED = 2
|
||
PRO = 3
|
||
|
||
class User:
|
||
name: str
|
||
level: Level
|
||
|
||
class Admin:
|
||
name: str
|
||
|
||
account: Union[User, Admin]
|
||
|
||
match account:
|
||
case Admin(name=name) | User(name=name, level=Level.PRO):
|
||
...
|
||
case User(level=Level.ADVANCED):
|
||
...
|
||
# Type-checking error: basic user unhandled
|
||
|
||
Obviously, no ``Matchable`` protocol (in terms of PEP 544) is needed, since
|
||
every class is matchable and therefore is subject to the checks specified
|
||
above.
|
||
|
||
|
||
Sealed classes as algebraic data types
|
||
--------------------------------------
|
||
|
||
Quite often it is desirable to apply exhaustiveness to a set of classes without
|
||
defining ad-hoc union types, which is itself fragile if a class is missing in
|
||
the union definition. A design pattern where a group of record-like classes is
|
||
combined into a union is popular in other languages that support pattern
|
||
matching and is known under a name of algebraic data types [2]_.
|
||
|
||
We propose to add a special decorator class ``@sealed`` to the ``typing``
|
||
module [6]_, that will have no effect at runtime, but will indicate to static
|
||
type checkers that all subclasses (direct and indirect) of this class should
|
||
be defined in the same module as the base class.
|
||
|
||
The idea is that since all subclasses are known, the type checker can treat
|
||
the sealed base class as a union of all its subclasses. Together with
|
||
dataclasses this allows a clean and safe support of algebraic data types
|
||
in Python. Consider this example::
|
||
|
||
from dataclasses import dataclass
|
||
from typing import sealed
|
||
|
||
@sealed
|
||
class Node:
|
||
...
|
||
|
||
class Expression(Node):
|
||
...
|
||
|
||
class Statement(Node):
|
||
...
|
||
|
||
@dataclass
|
||
class Name(Expression):
|
||
name: str
|
||
|
||
@dataclass
|
||
class Operation(Expression):
|
||
left: Expression
|
||
op: str
|
||
right: Expression
|
||
|
||
@dataclass
|
||
class Assignment(Statement):
|
||
target: str
|
||
value: Expression
|
||
|
||
@dataclass
|
||
class Print(Statement):
|
||
value: Expression
|
||
|
||
With such definition, a type checker can safely treat ``Node`` as
|
||
``Union[Name, Operation, Assignment, Print]``, and also safely treat e.g.
|
||
``Expression`` as ``Union[Name, Operation]``. So this will result in a type
|
||
checking error in the below snippet, because ``Name`` is not handled (and type
|
||
checker can give a useful error message)::
|
||
|
||
def dump(node: Node) -> str:
|
||
match node:
|
||
case Assignment(target, value):
|
||
return f"{target} = {dump(value)}"
|
||
case Print(value):
|
||
return f"print({dump(value)})"
|
||
case Operation(left, op, right):
|
||
return f"({dump(left)} {op} {dump(right)})"
|
||
|
||
|
||
Type erasure
|
||
------------
|
||
|
||
Class patterns are subject to runtime type erasure. Namely, although one
|
||
can define a type alias ``IntQueue = Queue[int]`` so that a pattern like
|
||
``IntQueue()`` is syntactically valid, type checkers should reject such a
|
||
match::
|
||
|
||
queue: Union[Queue[int], Queue[str]]
|
||
match queue:
|
||
case IntQueue(): # Type-checking error here
|
||
...
|
||
|
||
Note that the above snippet actually fails at runtime with the current
|
||
implementation of generic classes in the ``typing`` module, as well as
|
||
with builtin generic classes in the recently accepted PEP 585, because
|
||
they prohibit ``isinstance`` checks.
|
||
|
||
To clarify, generic classes are not prohibited in general from participating
|
||
in pattern matching, just that their type parameters can't be explicitly
|
||
specified. It is still fine if sub-patterns or literals bind the type
|
||
variables. For example::
|
||
|
||
from typing import Generic, TypeVar, Union
|
||
|
||
T = TypeVar('T')
|
||
|
||
class Result(Generic[T]):
|
||
first: T
|
||
other: list[T]
|
||
|
||
result: Union[Result[int], Result[str]]
|
||
|
||
match result:
|
||
case Result(first=int()):
|
||
... # Type of result is Result[int] here
|
||
case Result(other=["foo", "bar", *rest]):
|
||
... # Type of result is Result[str] here
|
||
|
||
|
||
Note about constants
|
||
--------------------
|
||
|
||
The fact that a capture pattern is always an assignment target may create unwanted
|
||
consequences when a user by mistake tries to "match" a value against
|
||
a constant instead of using the constant value pattern. As a result, at
|
||
runtime such a match will always succeed and moreover override the value of
|
||
the constant. It is important therefore that static type checkers warn about
|
||
such situations. For example::
|
||
|
||
from typing import Final
|
||
|
||
MAX_INT: Final = 2 ** 64
|
||
|
||
value = 0
|
||
|
||
match value:
|
||
case MAX_INT: # Type-checking error here: cannot assign to final name
|
||
print("Got big number")
|
||
case _:
|
||
print("Something else")
|
||
|
||
Note that the CPython reference implementation also generates a
|
||
``SyntaxWarning`` message for this case.
|
||
|
||
|
||
Precise type checking of star matches
|
||
-------------------------------------
|
||
|
||
Type checkers should perform precise type checking of star items in pattern
|
||
matching giving them either a heterogeneous ``list[T]`` type, or
|
||
a ``TypedDict`` type as specified by PEP 589. For example::
|
||
|
||
stuff: Tuple[int, str, str, float]
|
||
|
||
match stuff:
|
||
case a, *b, 0.5:
|
||
# Here a is int and b is list[str]
|
||
...
|
||
|
||
|
||
Performance Considerations
|
||
==========================
|
||
|
||
Ideally, a ``match`` statement should have good runtime performance compared
|
||
to an equivalent chain of if-statements. Although the history of programming
|
||
languages is rife with examples of new features which increased engineer
|
||
productivity at the expense of additional CPU cycles, it would be
|
||
unfortunate if the benefits of ``match`` were counter-balanced by a significant
|
||
overall decrease in runtime performance.
|
||
|
||
Although this PEP does not specify any particular implementation strategy,
|
||
a few words about the prototype implementation and how it attempts to
|
||
maximize performance are in order.
|
||
|
||
Basically, the prototype implementation transforms all of the ``match``
|
||
statement syntax into equivalent if/else blocks - or more accurately, into
|
||
Python byte codes that have the same effect. In other words, all of the
|
||
logic for testing instance types, sequence lengths, mapping keys and
|
||
so on are inlined in place of the ``match``.
|
||
|
||
This is not the only possible strategy, nor is it necessarily the best.
|
||
For example, the instance checks could be memoized, especially
|
||
if there are multiple instances of the same class type but with different
|
||
arguments in a single match statement. It is also theoretically
|
||
possible for a future implementation to process case clauses or sub-patterns in
|
||
parallel using a decision tree rather than testing them one by one.
|
||
|
||
|
||
Backwards Compatibility
|
||
=======================
|
||
|
||
This PEP is fully backwards compatible: the ``match`` and ``case``
|
||
keywords are proposed to be (and stay!) soft keywords, so their use as
|
||
variable, function, class, module or attribute names is not impeded at
|
||
all.
|
||
|
||
This is important because ``match`` is the name of a popular and
|
||
well-known function and method in the ``re`` module, which we have no
|
||
desire to break or deprecate.
|
||
|
||
The difference between hard and soft keywords is that hard keywords
|
||
are *always* reserved words, even in positions where they make no
|
||
sense (e.g. ``x = class + 1``), while soft keywords only get a special
|
||
meaning in context. Since PEP 617 the parser backtracks, that means that on
|
||
different attempts to parse a code fragment it could interpret a soft
|
||
keyword differently.
|
||
|
||
For example, suppose the parser encounters the following input::
|
||
|
||
match [x, y]:
|
||
|
||
The parser first attempts to parse this as an expression statement.
|
||
It interprets ``match`` as a NAME token, and then considers ``[x,
|
||
y]`` to be a double subscript. It then encounters the colon and has
|
||
to backtrack, since an expression statement cannot be followed by a
|
||
colon. The parser then backtracks to the start of the line and finds
|
||
that ``match`` is a soft keyword allowed in this position. It then
|
||
considers ``[x, y]`` to be a list expression. The colon then is just
|
||
what the parser expected, and the parse succeeds.
|
||
|
||
|
||
Impacts on third-party tools
|
||
============================
|
||
|
||
There are a lot of tools in the Python ecosystem that operate on Python
|
||
source code: linters, syntax highlighters, auto-formatters, and IDEs. These
|
||
will all need to be updated to include awareness of the ``match`` statement.
|
||
|
||
In general, these tools fall into one of two categories:
|
||
|
||
**Shallow** parsers don't try to understand the full syntax of Python, but
|
||
instead scan the source code for specific known patterns. IDEs, such as Visual
|
||
Studio Code, Emacs and TextMate, tend to fall in this category, since frequently
|
||
the source code is invalid while being edited, and a strict approach to parsing
|
||
would fail.
|
||
|
||
For these kinds of tools, adding knowledge of a new keyword is relatively
|
||
easy, just an addition to a table, or perhaps modification of a regular
|
||
expression.
|
||
|
||
**Deep** parsers understand the complete syntax of Python. An example of this
|
||
is the auto-formatter Black [9]_. A particular requirement with these kinds of
|
||
tools is that they not only need to understand the syntax of the current version
|
||
of Python, but older versions of Python as well.
|
||
|
||
The ``match`` statement uses a soft keyword, and it is one of the first major
|
||
Python features to take advantage of the capabilities of the new PEG parser. This
|
||
means that third-party parsers which are not 'PEG-compatible' will have a hard
|
||
time with the new syntax.
|
||
|
||
It has been noted that a number of these third-party tools leverage common parsing
|
||
libraries (Black for example uses a fork of the lib2to3 parser). It may be helpful
|
||
to identify widely-used parsing libraries (such as parso [10]_ and libCST [11]_)
|
||
and upgrade them to be PEG compatible.
|
||
|
||
However, since this work would need to be done not only for the match statement,
|
||
but for *any* new Python syntax that leverages the capabilities of the PEG parser,
|
||
it is considered out of scope for this PEP. (Although it is suggested that this
|
||
would make a fine Summer of Code project.)
|
||
|
||
|
||
Reference Implementation
|
||
========================
|
||
|
||
A `feature-complete CPython implementation
|
||
<https://github.com/brandtbucher/cpython/tree/patma>`_ is available on
|
||
GitHub.
|
||
|
||
An `interactive playground
|
||
<https://mybinder.org/v2/gh/gvanrossum/patma/master?urlpath=lab/tree/playground-622.ipynb>`_
|
||
based on the above implementation was created using Binder [12]_ and Jupyter [13]_.
|
||
|
||
Example Code
|
||
============
|
||
|
||
A small `collection of example code
|
||
<https://github.com/gvanrossum/patma/tree/master/examples>`_ is
|
||
available on GitHub.
|
||
|
||
|
||
.. _rejected ideas:
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
This general idea has been floating around for a pretty long time, and many
|
||
back and forth decisions were made. Here we summarize many alternative
|
||
paths that were taken but eventually abandoned.
|
||
|
||
Don't do this, pattern matching is hard to learn
|
||
------------------------------------------------
|
||
|
||
In our opinion, the proposed pattern matching is not more difficult than
|
||
adding ``isinstance()`` and ``getattr()`` to iterable unpacking. Also, we
|
||
believe the proposed syntax significantly improves readability for a wide
|
||
range of code patterns, by allowing to express *what* one wants to do, rather
|
||
than *how* to do it. We hope the few real code snippets we included in the PEP
|
||
above illustrate this comparison well enough. For more real code examples
|
||
and their translations see Ref. [7]_.
|
||
|
||
|
||
Don't do this, use existing method dispatching mechanisms
|
||
---------------------------------------------------------
|
||
|
||
We recognize that some of the use cases for the ``match`` statement overlap
|
||
with what can be done with traditional object-oriented programming (OOP) design
|
||
techniques using class inheritance. The ability to choose alternate
|
||
behaviors based on testing the runtime type of a match subject might
|
||
even seem heretical to strict OOP purists.
|
||
|
||
However, Python has always been a language that embraces a variety of
|
||
programming styles and paradigms. Classic Python design idioms such as
|
||
"duck"-typing go beyond the traditional OOP model.
|
||
|
||
We believe that there are important use cases where the use of ``match`` results
|
||
in a cleaner and more maintainable architecture. These use cases tend to
|
||
be characterized by a number of features:
|
||
|
||
* Algorithms which cut across traditional lines of data encapsulation. If an
|
||
algorithm is processing heterogeneous elements of different types (such as
|
||
evaluating or transforming an abstract syntax tree, or doing algebraic
|
||
manipulation of mathematical symbols), forcing the user to implement
|
||
the algorithm as individual methods on each element type results in
|
||
logic that is smeared across the entire codebase instead of being neatly
|
||
localized in one place.
|
||
* Program architectures where the set of possible data types is relatively
|
||
stable, but there is an ever-expanding set of operations to be performed
|
||
on those data types. Doing this in a strict OOP fashion requires constantly
|
||
adding new methods to both the base class and subclasses to support the new
|
||
methods, "polluting" the base class with lots of very specialized method
|
||
definitions, and causing widespread disruption and churn in the code. By
|
||
contrast, in a ``match``-based dispatch, adding a new behavior merely
|
||
involves writing a new ``match`` statement.
|
||
* OOP also does not handle dispatching based on the *shape* of an object, such
|
||
as the length of a tuple, or the presence of an attribute -- instead any such
|
||
dispatching decision must be encoded into the object's type. Shape-based
|
||
dispatching is particularly interesting when it comes to handling "duck"-typed
|
||
objects.
|
||
|
||
Where OOP is clearly superior is in the opposite case: where the set of possible
|
||
operations is relatively stable and well-defined, but there is an ever-growing
|
||
set of data types to operate on. A classic example of this is UI widget toolkits,
|
||
where there is a fixed set of interaction types (repaint, mouse click, keypress,
|
||
and so on), but the set of widget types is constantly expanding as developers
|
||
invent new and creative user interaction styles. Adding a new kind of widget
|
||
is a simple matter of writing a new subclass, whereas with a match-based approach
|
||
you end up having to add a new case clause to many widespread match statements.
|
||
We therefore don't recommend using ``match`` in such a situation.
|
||
|
||
|
||
Allow more flexible assignment targets instead
|
||
----------------------------------------------
|
||
|
||
There was an idea to instead just generalize the iterable unpacking to much
|
||
more general assignment targets, instead of adding a new kind of statement.
|
||
This concept is known in some other languages as "irrefutable matches". We
|
||
decided not to do this because inspection of real-life potential use cases
|
||
showed that in vast majority of cases destructuring is related to an ``if``
|
||
condition. Also many of those are grouped in a series of exclusive choices.
|
||
|
||
|
||
Make it an expression
|
||
---------------------
|
||
|
||
In most other languages pattern matching is represented by an expression, not
|
||
statement. But making it an expression would be inconsistent with other
|
||
syntactic choices in Python. All decision making logic is expressed almost
|
||
exclusively in statements, so we decided to not deviate from this.
|
||
|
||
|
||
Use a hard keyword
|
||
------------------
|
||
|
||
There were options to make ``match`` a hard keyword, or choose a different
|
||
keyword. Although using a hard keyword would simplify life for simple-minded
|
||
syntax highlighters, we decided not to use hard keyword for several reasons:
|
||
|
||
* Most importantly, the new parser doesn't require us to do this. Unlike with
|
||
``async`` that caused hardships with being a soft keyword for few releases,
|
||
here we can make ``match`` a permanent soft keyword.
|
||
|
||
* ``match`` is so commonly used in existing code, that it would break almost
|
||
every existing program and will put a burden to fix code on many people who
|
||
may not even benefit from the new syntax.
|
||
|
||
* It is hard to find an alternative keyword that would not be commonly used
|
||
in existing programs as an identifier, and would still clearly reflect the
|
||
meaning of the statement.
|
||
|
||
|
||
Use ``as`` or ``|`` instead of ``case`` for case clauses
|
||
--------------------------------------------------------
|
||
|
||
The pattern matching proposed here is a combination of multi-branch control
|
||
flow (in line with ``switch`` in Algol-derived languages or ``cond`` in Lisp)
|
||
and object-deconstruction as found in functional languages. While the proposed
|
||
keyword ``case`` highlights the multi-branch aspect, alternative keywords such
|
||
as ``as`` would equally be possible, highlighting the deconstruction aspect.
|
||
``as`` or ``with``, for instance, also have the advantage of already being
|
||
keywords in Python. However, since ``case`` as a keyword can only occur as a
|
||
leading keyword inside a ``match`` statement, it is easy for a parser to
|
||
distinguish between its use as a keyword or as a variable.
|
||
|
||
Other variants would use a symbol like ``|`` or ``=>``, or go entirely without
|
||
special marker.
|
||
|
||
Since Python is a statement-oriented language in the tradition of Algol, and as
|
||
each composite statement starts with an identifying keyword, ``case`` seemed to
|
||
be most in line with Python's style and traditions.
|
||
|
||
|
||
Use a flat indentation scheme
|
||
-----------------------------
|
||
|
||
There was an idea to use an alternative indentation scheme, for example where
|
||
every case clause would not be indented with respect to the initial ``match``
|
||
part::
|
||
|
||
match expression:
|
||
case pattern_1:
|
||
...
|
||
case pattern_2:
|
||
...
|
||
|
||
The motivation is that although flat indentation saves some horizontal space,
|
||
it may look awkward to an eye of a Python programmer, because everywhere else
|
||
colon is followed by an indent. This will also complicate life for
|
||
simple-minded code editors. Finally, the horizontal space issue can be
|
||
alleviated by allowing "half-indent" (i.e. two spaces instead of four) for
|
||
match statements.
|
||
|
||
In sample programs using ``match``, written as part of the development of this
|
||
PEP, a noticeable improvement in code brevity is observed, more than making up
|
||
for the additional indentation level.
|
||
|
||
Another proposal considered was to use flat indentation but put the
|
||
expression on the line after ``match:``, like this::
|
||
|
||
match:
|
||
expression
|
||
case pattern_1:
|
||
...
|
||
case pattern_2:
|
||
...
|
||
|
||
This was ultimately rejected because the first block would be a
|
||
novelty in Python's grammar: a block whose only content is a single
|
||
expression rather than a sequence of statements.
|
||
|
||
|
||
Alternatives for constant value pattern
|
||
---------------------------------------
|
||
|
||
This is probably the trickiest item. Matching against some pre-defined
|
||
constants is very common, but the dynamic nature of Python also makes it
|
||
ambiguous with capture patterns. Five other alternatives were considered:
|
||
|
||
* Use some implicit rules. For example if a name was defined in the global
|
||
scope, then it refers to a constant, rather than representing a
|
||
capture pattern::
|
||
|
||
# Here, the name "spam" must be defined in the global scope (and
|
||
# not shadowed locally). "side" must be local.
|
||
|
||
match entree[-1]:
|
||
case spam: ... # Compares entree[-1] == spam.
|
||
case side: ... # Assigns side = entree[-1].
|
||
|
||
This however can cause surprises and action at a distance if someone
|
||
defines an unrelated coinciding name before the match statement.
|
||
|
||
* Use a rule based on the case of a name. In particular, if the name
|
||
starts with a lowercase letter it would be a capture pattern, while if
|
||
it starts with uppercase it would refer to a constant::
|
||
|
||
match entree[-1]:
|
||
case SPAM: ... # Compares entree[-1] == SPAM.
|
||
case side: ... # Assigns side = entree[-1].
|
||
|
||
This works well with the recommendations for naming constants from
|
||
PEP 8. The main objection is that there's no other part of core
|
||
Python where the case of a name is semantically significant.
|
||
In addition, Python allows identifiers to use different scripts,
|
||
many of which (e.g. CJK) don't have a case distinction.
|
||
|
||
* Use extra parentheses to indicate lookup semantics for a given name. For
|
||
example::
|
||
|
||
match entree[-1]:
|
||
case (spam): ... # Compares entree[-1] == spam.
|
||
case side: ... # Assigns side = entree[-1].
|
||
|
||
This may be a viable option, but it can create some visual noise if used
|
||
often. Also honestly it looks pretty unusual, especially in nested contexts.
|
||
|
||
This also has the problem that we may want or need parentheses to
|
||
disambiguate grouping in patterns, e.g. in ``Point(x, y=(y :=
|
||
complex()))``.
|
||
|
||
* Introduce a special symbol, for example ``.``, ``?``, ``$``, or ``^`` to
|
||
indicate that a given name is a value to be matched against, not
|
||
to be assigned to. An earlier version of this proposal used a
|
||
leading-dot rule::
|
||
|
||
match entree[-1]:
|
||
case .spam: ... # Compares entree[-1] == spam.
|
||
case side: ... # Assigns side = entree[-1].
|
||
|
||
While potentially useful, it introduces strange-looking new syntax
|
||
without making the pattern syntax any more expressive. Indeed,
|
||
named constants can be made to work with the existing rules by
|
||
converting them to ``Enum`` types, or enclosing them in their own
|
||
namespace (considered by the authors to be one honking great idea)::
|
||
|
||
match entree[-1]:
|
||
case Sides.SPAM: ... # Compares entree[-1] == Sides.SPAM.
|
||
case side: ... # Assigns side = entree[-1].
|
||
|
||
If needed, the leading-dot rule (or a similar variant) could be
|
||
added back later with no backward-compatibility issues.
|
||
|
||
* There was also on idea to make lookup semantics the default, and require
|
||
``$`` or ``?`` to be used in capture patterns::
|
||
|
||
match entree[-1]:
|
||
case spam: ... # Compares entree[-1] == spam.
|
||
case side?: ... # Assigns side = entree[-1].
|
||
|
||
There are a few issues with this:
|
||
|
||
* Capture patterns are more common in typical code, so it is
|
||
undesirable to require special syntax for them.
|
||
|
||
* The authors are not aware of any other language that adorns
|
||
captures in this way.
|
||
|
||
* None of the proposed syntaxes have any precedent in Python;
|
||
no other place in Python that binds names (e.g. ``import``,
|
||
``def``, ``for``) uses special marker syntax.
|
||
|
||
* It would break the syntactic parallels of the current grammar::
|
||
|
||
match coords:
|
||
case ($x, $y):
|
||
return Point(x, y) # Why not "Point($x, $y)"?
|
||
|
||
|
||
In the end, these alternatives were rejected because of the mentioned drawbacks.
|
||
|
||
|
||
Disallow float literals in patterns
|
||
-----------------------------------
|
||
|
||
Because of the inexactness of floats, an early version of this proposal
|
||
did not allow floating-point constants to be used as match patterns. Part
|
||
of the justification for this prohibition is that Rust does this.
|
||
|
||
However, during implementation, it was discovered that distinguishing between
|
||
float values and other types required extra code in the VM that would slow
|
||
matches generally. Given that Python and Rust are very different languages
|
||
with different user bases and underlying philosophies, it was felt that
|
||
allowing float literals would not cause too much harm, and would be less
|
||
surprising to users.
|
||
|
||
|
||
Range matching patterns
|
||
-----------------------
|
||
|
||
This would allow patterns such as ``1...6``. However, there are a host of
|
||
ambiguities:
|
||
|
||
* Is the range open, half-open, or closed? (I.e. is ``6`` included in the
|
||
above example or not?)
|
||
* Does the range match a single number, or a range object?
|
||
* Range matching is often used for character ranges ('a'...'z') but that
|
||
won't work in Python since there's no character data type, just strings.
|
||
* Range matching can be a significant performance optimization if you can
|
||
pre-build a jump table, but that's not generally possible in Python due
|
||
to the fact that names can be dynamically rebound.
|
||
|
||
Rather than creating a special-case syntax for ranges, it was decided
|
||
that allowing custom pattern objects (``InRange(0, 6)``) would be more flexible
|
||
and less ambiguous; however those ideas have been postponed for the time
|
||
being (See `deferred ideas`_).
|
||
|
||
|
||
Use dispatch dict semantics for matches
|
||
---------------------------------------
|
||
|
||
Implementations for classic ``switch`` statement sometimes use a pre-computed
|
||
hash table instead of a chained equality comparisons to gain some performance.
|
||
In the context of ``match`` statement this is technically also possible for
|
||
matches against literal patterns. However, having subtly different semantics
|
||
for different kinds of patterns would be too surprising for potentially
|
||
modest performance win.
|
||
|
||
We can still experiment with possible performance optimizations in this
|
||
direction if they will not cause semantic differences.
|
||
|
||
|
||
Use ``continue`` and ``break`` in case clauses.
|
||
-----------------------------------------------
|
||
|
||
Another rejected proposal was to define new meanings for ``continue``
|
||
and ``break`` inside of ``match``, which would have the following behavior:
|
||
|
||
* ``continue`` would exit the current case clause and continue matching
|
||
at the next case clause.
|
||
* ``break`` would exit the match statement.
|
||
|
||
However, there is a serious drawback to this proposal: if the ``match`` statement
|
||
is nested inside of a loop, the meanings of ``continue`` and ``break`` are now
|
||
changed. This may cause unexpected behavior during refactorings; also, an
|
||
argument can be made that there are other means to get the same behavior (such
|
||
as using guard conditions), and that in practice it's likely that the existing
|
||
behavior of ``continue`` and ``break`` are far more useful.
|
||
|
||
|
||
AND (``&``) patterns
|
||
--------------------
|
||
|
||
This proposal defines an OR-pattern (``|``) to match one of several alternates;
|
||
why not also an AND-pattern (``&``)? Especially given that some other languages
|
||
(F# for example) support this.
|
||
|
||
However, it's not clear how useful this would be. The semantics for matching
|
||
dictionaries, objects and sequences already incorporates an implicit 'and': all
|
||
attributes and elements mentioned must be present for the match to succeed. Guard
|
||
conditions can also support many of the use cases that a hypothetical 'and'
|
||
operator would be used for.
|
||
|
||
In the end, it was decided that this would make the syntax more complex without
|
||
adding a significant benefit.
|
||
|
||
|
||
Negative match patterns
|
||
-----------------------
|
||
|
||
A negation of a match pattern using the operator ``!`` as a prefix would match
|
||
exactly if the pattern itself does not match. For instance, ``!(3 | 4)``
|
||
would match anything except ``3`` or ``4``.
|
||
|
||
This was rejected because there is documented evidence [8]_ that this feature
|
||
is rarely useful (in languages which support it) or used as double negation
|
||
``!!`` to control variable scopes and prevent variable bindings (which does
|
||
not apply to Python). It can also be simulated using guard conditions.
|
||
|
||
|
||
Check exhaustiveness at runtime
|
||
-------------------------------
|
||
|
||
The question is what to do if no case clause has a matching pattern, and
|
||
there is no default case. An earlier version of the proposal specified that
|
||
the behavior in this case would be to throw an exception rather than
|
||
silently falling through.
|
||
|
||
The arguments back and forth were many, but in the end the EIBTI (Explicit
|
||
Is Better Than Implicit) argument won out: it's better to have the programmer
|
||
explicitly throw an exception if that is the behavior they want.
|
||
|
||
For cases such as sealed classes and enums, where the patterns are all known
|
||
to be members of a discrete set, `static checkers`_ can warn about missing
|
||
patterns.
|
||
|
||
|
||
Type annotations for pattern variables
|
||
--------------------------------------
|
||
|
||
The proposal was to combine patterns with type annotations::
|
||
|
||
match x:
|
||
case [a: int, b: str]: print(f"An int {a} and a string {b}:)
|
||
case [a: int, b: int, c: int]: print(f"Three ints", a, b, c)
|
||
...
|
||
|
||
This idea has a lot of problems. For one, the colon can only
|
||
be used inside of brackets or parens, otherwise the syntax becomes
|
||
ambiguous. And because Python disallows ``isinstance()`` checks
|
||
on generic types, type annotations containing generics will not
|
||
work as expected.
|
||
|
||
|
||
Allow ``*rest`` in class patterns
|
||
---------------------------------
|
||
|
||
It was proposed to allow ``*rest`` in a class pattern, giving a
|
||
variable to be bound to all positional arguments at once (similar to
|
||
its use in unpacking assignments). It would provide some symmetry
|
||
with sequence patterns. But it might be confused with a feature to
|
||
provide the *values* for all positional arguments at once. And there
|
||
seems to be no practical need for it, so it was scrapped. (It could
|
||
easily be added at a later stage if a need arises.)
|
||
|
||
Disallow ``_.a`` in constant value patterns
|
||
------------------------------------------------------
|
||
|
||
The first public draft said that the initial name in a constant value
|
||
pattern must not be ``_`` because ``_`` has a special meaning in
|
||
pattern matching, so this would be invalid::
|
||
|
||
case _.a: ...
|
||
|
||
(However, ``a._`` would be legal and load the attribute with name
|
||
``_`` of the object ``a`` as usual.)
|
||
|
||
There was some pushback against this on python-dev (some people have a
|
||
legitimate use for ``_`` as an important global variable, esp. in
|
||
i18n) and the only reason for this prohibition was to prevent some
|
||
user confusion. But it's not the hill to die on.
|
||
|
||
Use some other token as wildcard
|
||
--------------------------------
|
||
|
||
It has been proposed to use ``...`` (i.e., the ellipsis token) or
|
||
``*`` (star) as a wildcard. However, both these look as if an
|
||
arbitrary number of items is omitted::
|
||
|
||
case [a, ..., z]: ...
|
||
case [a, *, z]: ...
|
||
|
||
Both look like the would match a sequence of at two or more items,
|
||
capturing the first and last values.
|
||
|
||
In addition, if ``*`` were to be used as the wildcard character, we
|
||
would have to come up with some other way to capture the rest of a
|
||
sequence, currently spelled like this::
|
||
|
||
case [first, second, *rest]: ...
|
||
|
||
Using an ellipsis would also be more confusing in documentation and
|
||
examples, where ``...`` is routinely used to indicate something
|
||
obvious or irrelevant. (Yes, this would also be an argument against
|
||
the other uses of ``...`` in Python, but that water is already under
|
||
the bridge.)
|
||
|
||
Another proposal was to use ``?``. This could be acceptable, although
|
||
it would require modifying the tokenizer.
|
||
|
||
Also, ``_`` is already used
|
||
as a throwaway target in other contexts, and this use is pretty
|
||
similar. This example is from ``difflib.py`` in the stdlib::
|
||
|
||
for tag, _, _, j1, j2 in group: ...
|
||
|
||
Perhaps the most convincing argument is that ``_`` is used as the
|
||
wildcard in every other language we've looked at supporting pattern
|
||
matching: C#, Elixir, Erlang, F#, Haskell, Mathematica, OCaml, Ruby,
|
||
Rust, Scala, and Swift. Now, in general, we should not be concerned
|
||
too much with what another language does, since Python is clearly
|
||
different from all these languages. However, if there is such an
|
||
overwhelming and strong consensus, Python should not go out of its way
|
||
to do something completely different -- particularly given that ``_``
|
||
works well in Python and is already in use as a throwaway target.
|
||
|
||
Note that ``_`` is not assigned to by patterns -- this avoids
|
||
conflicts with the use of ``_`` as a marker for translatable strings
|
||
and an alias for ``gettext.gettext``, as recommended by the
|
||
``gettext`` module documentation.
|
||
|
||
Use some other syntax instead of ``|`` for OR patterns
|
||
------------------------------------------------------
|
||
|
||
A few alternatives to using ``|`` to separate the alternatives in OR
|
||
patterns have been proposed. Instead of::
|
||
|
||
case 401|403|404:
|
||
print("Some HTTP error")
|
||
|
||
the following proposals have been fielded:
|
||
|
||
- Use a comma::
|
||
|
||
case 401, 403, 404:
|
||
print("Some HTTP error")
|
||
|
||
This looks too much like a tuple -- we would have to find a
|
||
different way to spell tuples, and the construct would have to be
|
||
parenthesized inside the argument list of a class pattern. In
|
||
general, commas already have many different meanings in Python, we
|
||
shouldn't add more.
|
||
|
||
- Allow stacked cases::
|
||
|
||
case 401:
|
||
case 403:
|
||
case 404:
|
||
print("Some HTTP error")
|
||
|
||
This is how this would be done in C, using its fall-through
|
||
semantics for cases. However, we don't want to mislead people into
|
||
thinking that ``match``/``case`` uses fall-through semantics (which
|
||
are a common source of bugs in C). Also, this would be a novel
|
||
indentation pattern, which might make it harder to support in IDEs
|
||
and such (it would break the simple rule "add an indentation level
|
||
after a line ending in a colon"). Finally, this wouldn't support
|
||
OR patterns nested inside other patterns.
|
||
|
||
- Use ``case in`` followed by a comma-separated list::
|
||
|
||
case in 401, 403, 404:
|
||
print("Some HTTP error")
|
||
|
||
This wouldn't work for OR patterns nested inside other patterns,
|
||
like::
|
||
|
||
case Point(0|1, 0|1):
|
||
print("A corner of the unit square")
|
||
|
||
- Use the ``or`` keyword::
|
||
|
||
case 401 or 403 or 404:
|
||
print("Some HTTP error")
|
||
|
||
This could work, and the readability is not too different from using
|
||
``|``. Some users expressed a preference for ``or`` because they
|
||
associate ``|`` with bitwise OR. However:
|
||
|
||
1. Many other languages that have pattern matching use ``|`` (the
|
||
list includes Elixir, Erlang, F#, Mathematica, OCaml, Ruby, Rust,
|
||
and Scala).
|
||
2. ``|`` is shorter, which may contribute to the readability of
|
||
nested patterns like ``Point(0|1, 0|1)``.
|
||
3. Some people mistakenly believe that ``|`` has the wrong priority;
|
||
but since patterns don't support other operators it has the same
|
||
priority as in expressions.
|
||
4. Python users use ``or`` very frequently, and may build an
|
||
impression that it is strongly associated with Boolean
|
||
short-circuiting.
|
||
5. ``|`` is used between alternatives in regular expressions
|
||
and in EBNF grammars (like Python's own).
|
||
6. ``|`` not just used for bitwise OR -- it's used for set unions,
|
||
dict merging (:pep:`584`) and is being considered as an
|
||
alternative to ``typing.Union`` (:pep:`604`).
|
||
7. ``|`` works better as a visual separator, especially between
|
||
strings. Compare::
|
||
|
||
case "spam" or "eggs" or "cheese":
|
||
|
||
to::
|
||
|
||
case "spam" | "eggs" | "cheese":
|
||
|
||
Add an ``else`` clause
|
||
----------------------
|
||
|
||
We decided not to add an ``else`` clause for several reasons.
|
||
|
||
- It is redundant, since we already have ``case _:``
|
||
|
||
- There will forever be confusion about the indentation level of the
|
||
``else:`` -- should it align with the list of cases or with the
|
||
``match`` keyword?
|
||
|
||
- Completionist arguments like "every other statement has one" are
|
||
false -- only those statements have an ``else`` clause where it adds
|
||
new functionality.
|
||
|
||
|
||
.. _deferred ideas:
|
||
|
||
Deferred Ideas
|
||
==============
|
||
|
||
There were a number of proposals to extend the matching syntax that we
|
||
decided to postpone for possible future PEP. These fall into the realm of
|
||
"cool idea but not essential", and it was felt that it might be better to
|
||
acquire some real-world data on how the match statement will be used in
|
||
practice before moving forward with some of these proposals.
|
||
|
||
Note that in each case, the idea was judged to be a "two-way door",
|
||
meaning that there should be no backwards-compatibility issues with adding
|
||
these features later.
|
||
|
||
One-off syntax variant
|
||
----------------------
|
||
|
||
While inspecting some code-bases that may benefit the most from the proposed
|
||
syntax, it was found that single clause matches would be used relatively often,
|
||
mostly for various special-casing. In other languages this is supported in
|
||
the form of one-off matches. We proposed to support such one-off matches too::
|
||
|
||
if match value as pattern [and guard]:
|
||
...
|
||
|
||
or, alternatively, without the ``if``::
|
||
|
||
match value as pattern [if guard]:
|
||
...
|
||
|
||
as equivalent to the following expansion::
|
||
|
||
match value:
|
||
case pattern [if guard]:
|
||
...
|
||
|
||
To illustrate how this will benefit readability, consider this (slightly
|
||
simplified) snippet from real code::
|
||
|
||
if isinstance(node, CallExpr):
|
||
if (isinstance(node.callee, NameExpr) and len(node.args) == 1 and
|
||
isinstance(node.args[0], NameExpr)):
|
||
call = node.callee.name
|
||
arg = node.args[0].name
|
||
... # Continue special-casing 'call' and 'arg'
|
||
... # Follow with common code
|
||
|
||
This can be rewritten in a more straightforward way as::
|
||
|
||
if match node as CallExpr(callee=NameExpr(name=call), args=[NameExpr(name=arg)]):
|
||
... # Continue special-casing 'call' and 'arg'
|
||
... # Follow with common code
|
||
|
||
This one-off form would not allow ``elif match`` statements, as it was only
|
||
meant to handle a single pattern case. It was intended to be special case
|
||
of a ``match`` statement, not a special case of an ``if`` statement::
|
||
|
||
if match value_1 as patter_1 [and guard_1]:
|
||
...
|
||
elif match value_2 as pattern_2 [and guard_2]: # Not allowed
|
||
...
|
||
elif match value_3 as pattern_3 [and guard_3]: # Not allowed
|
||
...
|
||
else: # Also not allowed
|
||
...
|
||
|
||
This would defeat the purpose of one-off matches as a complement to exhaustive
|
||
full matches - it's better and clearer to use a full match in this case.
|
||
|
||
Similarly, ``if not match`` would not be allowed, since ``match ... as ...`` is not
|
||
an expression. Nor do we propose a ``while match`` construct present in some languages
|
||
with pattern matching, since although it may be handy, it will likely be used
|
||
rarely.
|
||
|
||
Other pattern-based constructions
|
||
---------------------------------
|
||
|
||
Many other languages supporting pattern-matching use it as a basis for multiple
|
||
language constructs, including a matching operator, a generalized form
|
||
of assignment, a filter for loops, a method for synchronizing communication,
|
||
or specialized if statements. Some of these were mentioned in the discussion
|
||
of the first draft. Another question asked was why this particular form (joining
|
||
binding and conditional selection) was chosen while other forms were not.
|
||
|
||
Introducing more uses of patterns would be too bold and premature given the
|
||
experience we have using patterns, and would make this proposal too
|
||
complicated. The statement as presented provides a form of the feature that
|
||
is sufficiently general to be useful while being self-contained, and without
|
||
having a massive impact on the syntax and semantics of the language as a whole.
|
||
|
||
After some experience with this feature, the community may have a better
|
||
feeling for what other uses of pattern matching could be valuable in Python.
|
||
|
||
Algebraic matching of repeated names
|
||
------------------------------------
|
||
|
||
A technique occasionally seen in functional languages like Erlang and Elixir is
|
||
to use a match variable multiple times in the same pattern::
|
||
|
||
match value:
|
||
case Point(x, x):
|
||
print("Point is on a diagonal!")
|
||
|
||
The idea here is that the first appearance of ``x`` would bind the value
|
||
to the name, and subsequent occurrences would verify that the incoming
|
||
value was equal to the value previously bound. If the value was not equal,
|
||
the match would fail.
|
||
|
||
However, there are a number of subtleties involved with mixing load-store
|
||
semantics for capture patterns. For the moment, we decided to make repeated
|
||
use of names within the same pattern an error; we can always relax this
|
||
restriction later without affecting backwards compatibility.
|
||
|
||
Note that you **can** use the same name more than once in alternate choices::
|
||
|
||
match value:
|
||
case x | [x]:
|
||
# etc.
|
||
|
||
|
||
.. _extended matching:
|
||
|
||
Custom matching protocol
|
||
------------------------
|
||
|
||
During the initial design discussions for this PEP, there were a lot of ideas
|
||
thrown around about custom matchers. There were a couple of motivations for
|
||
this:
|
||
|
||
* Some classes might want to expose a different set of "matchable" names
|
||
than the actual class properties.
|
||
* Some classes might have properties that are expensive to calculate, and
|
||
therefore shouldn't be evaluated unless the match pattern actually needed
|
||
access to them.
|
||
* There were ideas for exotic matchers such as ``IsInstance()``,
|
||
``InRange()``, ``RegexMatchingGroup()`` and so on.
|
||
* In order for built-in types and standard library classes to be able
|
||
to support matching in a reasonable and intuitive way, it was believed
|
||
that these types would need to implement special matching logic.
|
||
|
||
These customized match behaviors would be controlled by a special
|
||
``__match__`` method on the class name. There were two competing variants:
|
||
|
||
* A 'full-featured' match protocol which would pass in not only
|
||
the subject to be matched, but detailed information about
|
||
which attributes the specified pattern was interested in.
|
||
* A simplified match protocol, which only passed in the subject value,
|
||
and which returned a "proxy object" (which in most cases could be
|
||
just the subject) containing the matchable attributes.
|
||
|
||
Here's an example of one version of the more complex protocol proposed::
|
||
|
||
match expr:
|
||
case BinaryOp(left=Number(value=x), op=op, right=Number(value=y)):
|
||
...
|
||
|
||
from types import PatternObject
|
||
BinaryOp.__match__(
|
||
(),
|
||
{
|
||
"left": PatternObject(Number, (), {"value": ...}, -1, False),
|
||
"op": ...,
|
||
"right": PatternObject(Number, (), {"value": ...}, -1, False),
|
||
},
|
||
-1,
|
||
False,
|
||
)
|
||
|
||
One drawback of this protocol is that the arguments to ``__match__``
|
||
would be expensive to construct, and could not be pre-computed due to
|
||
the fact that, because of the way names are bound, there are no real
|
||
constants in Python. It also meant that the ``__match__`` method would
|
||
have to re-implement much of the logic of matching which would otherwise
|
||
be implemented in C code in the Python VM. As a result, this option would
|
||
perform poorly compared to an equilvalent ``if``-statement.
|
||
|
||
The simpler protocol suffered from the fact that although it was more
|
||
performant, it was much less flexible, and did not allow for many of
|
||
the creative custom matchers that people were dreaming up.
|
||
|
||
Late in the design process, however, it was realized that the need for
|
||
a custom matching protocol was much less than anticipated. Virtually
|
||
all the realistic (as opposed to fanciful) uses cases brought up could
|
||
be handled by the built-in matching behavior, although in a few cases
|
||
an extra guard condition was required to get the desired effect.
|
||
|
||
Moreover, it turned out that none of the standard library classes really
|
||
needed any special matching support other than an appropriate
|
||
``__match_args__`` property.
|
||
|
||
The decision to postpone this feature came with a realization that this is
|
||
not a one-way door; that a more flexible and customizable matching protocol
|
||
can be added later, especially as we gain more experience with real-world
|
||
use cases and actual user needs.
|
||
|
||
The authors of this PEP expect that the ``match`` statement will evolve
|
||
over time as usage patterns and idioms evolve, in a way similar to what
|
||
other "multi-stage" PEPs have done in the past. When this happens, the
|
||
extended matching issue can be revisited.
|
||
|
||
|
||
Parameterized Matching Syntax
|
||
-----------------------------
|
||
|
||
(Also known as "Class Instance Matchers".)
|
||
|
||
This is another variant of the "custom match classes" idea that would allow
|
||
diverse kinds of custom matchers mentioned in the previous section -- however,
|
||
instead of using an extended matching protocol, it would be achieved by
|
||
introducing an additional pattern type with its own syntax. This pattern type
|
||
would accept two distinct sets of parameters: one set which consists of the
|
||
actual parameters passed into the pattern object's constructor, and another
|
||
set representing the binding variables for the pattern.
|
||
|
||
The ``__match__`` method of these objects could use the constructor parameter
|
||
values in deciding what was a valid match.
|
||
|
||
This would allow patterns such as ``InRange<0, 6>(value)``, which would match
|
||
a number in the range 0..6 and assign the matched value to 'value'. Similarly,
|
||
one could have a pattern which tests for the existence of a named group in
|
||
a regular expression match result (different meaning of the word 'match').
|
||
|
||
Although there is some support for this idea, there was a lot of bikeshedding
|
||
on the syntax (there are not a lot of attractive options available)
|
||
and no clear consensus was reached, so it was decided that for now, this
|
||
feature is not essential to the PEP.
|
||
|
||
|
||
Pattern Utility Library
|
||
-----------------------
|
||
|
||
Both of the previous ideas would be accompanied by a new Python standard
|
||
library module which would contain a rich set of useful matchers.
|
||
However, it it not really possible to implement such a library without
|
||
adopting one of the extended pattern proposals given in the previous sections,
|
||
so this idea is also deferred.
|
||
|
||
|
||
Acknowledgments
|
||
===============
|
||
|
||
We are grateful for the help of the following individuals (among many
|
||
others) for helping out during various phases of the writing of this
|
||
PEP:
|
||
|
||
- Gregory P. Smith
|
||
- Jim Jewett
|
||
- Mark Shannon
|
||
- Nate Lust
|
||
- Taine Zhao
|
||
|
||
|
||
Version History
|
||
===============
|
||
|
||
1. Initial version
|
||
|
||
2. Substantial rewrite, including:
|
||
|
||
- Minor clarifications, grammar and typo corrections
|
||
- Rename various concepts
|
||
- Additional discussion of rejected ideas, including:
|
||
|
||
- Why we choose ``_`` for wildcard patterns
|
||
- Why we choose ``|`` for OR patterns
|
||
- Why we choose not to use special syntax for capture variables
|
||
- Why this pattern matching operation and not others
|
||
|
||
- Clarify exception and side effect semantics
|
||
- Clarify partial binding semantics
|
||
- Drop restriction on use of ``_`` in load contexts
|
||
- Drop the default single positional argument being the whole
|
||
subject except for a handful of built-in types
|
||
- Simplify behavior of ``__match_args__``
|
||
- Drop the ``__match__`` protocol (moved to `deferred ideas`_)
|
||
- Drop ``ImpossibleMatchError`` exception
|
||
- Drop leading dot for loads (moved to `deferred ideas`_)
|
||
- Reworked the initial sections (everything before `syntax`_)
|
||
- Added an overview of all the types of patterns before the
|
||
detailed description
|
||
- Added simplified syntax next to the description of each pattern
|
||
- Separate description of the wildcard from capture patterns
|
||
- Added Daniel F Moisset as sixth co-author
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1]
|
||
https://en.wikipedia.org/wiki/Pattern_matching
|
||
|
||
.. [2]
|
||
https://en.wikipedia.org/wiki/Algebraic_data_type
|
||
|
||
.. [3]
|
||
https://doc.rust-lang.org/reference/patterns.html
|
||
|
||
.. [4]
|
||
https://docs.scala-lang.org/tour/pattern-matching.html
|
||
|
||
.. [5]
|
||
https://docs.python.org/3/library/dataclasses.html
|
||
|
||
.. [6]
|
||
https://docs.python.org/3/library/typing.html
|
||
|
||
.. [7]
|
||
https://github.com/gvanrossum/patma/blob/master/EXAMPLES.md
|
||
|
||
.. [8]
|
||
https://dl.acm.org/doi/abs/10.1145/2480360.2384582
|
||
|
||
.. [9]
|
||
https://black.readthedocs.io/en/stable/
|
||
|
||
.. [10]
|
||
https://github.com/davidhalter/parso
|
||
|
||
.. [11]
|
||
https://github.com/Instagram/LibCST
|
||
|
||
.. [12]
|
||
https://mybinder.org
|
||
|
||
.. [13]
|
||
https://jupyter.org
|
||
|
||
|
||
.. _Appendix A:
|
||
|
||
Appendix A -- Full Grammar
|
||
==========================
|
||
|
||
Here is the full grammar for ``match_stmt``. This is an additional
|
||
alternative for ``compound_stmt``. It should be understood that
|
||
``match`` and ``case`` are soft keywords, i.e. they are not reserved
|
||
words in other grammatical contexts (including at the start of a line
|
||
if there is no colon where expected). By convention, hard keywords
|
||
use single quotes while soft keywords use double quotes.
|
||
|
||
Other notation used beyond standard EBNF:
|
||
|
||
- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*``
|
||
- ``!RULE`` is a negative lookahead assertion
|
||
|
||
::
|
||
|
||
match_expr:
|
||
| star_named_expression ',' star_named_expressions?
|
||
| named_expression
|
||
match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT
|
||
case_block: "case" patterns [guard] ':' block
|
||
guard: 'if' named_expression
|
||
patterns: value_pattern ',' [values_pattern] | pattern
|
||
pattern: walrus_pattern | or_pattern
|
||
walrus_pattern: NAME ':=' or_pattern
|
||
or_pattern: '|'.closed_pattern+
|
||
closed_pattern:
|
||
| capture_pattern
|
||
| literal_pattern
|
||
| constant_pattern
|
||
| group_pattern
|
||
| sequence_pattern
|
||
| mapping_pattern
|
||
| class_pattern
|
||
capture_pattern: NAME !('.' | '(' | '=')
|
||
literal_pattern:
|
||
| signed_number !('+' | '-')
|
||
| signed_number '+' NUMBER
|
||
| signed_number '-' NUMBER
|
||
| strings
|
||
| 'None'
|
||
| 'True'
|
||
| 'False'
|
||
constant_pattern: attr !('.' | '(' | '=')
|
||
group_pattern: '(' patterns ')'
|
||
sequence_pattern: '[' [values_pattern] ']' | '(' ')'
|
||
mapping_pattern: '{' items_pattern? '}'
|
||
class_pattern:
|
||
| name_or_attr '(' ')'
|
||
| name_or_attr '(' ','.pattern+ ','? ')'
|
||
| name_or_attr '(' ','.keyword_pattern+ ','? ')'
|
||
| name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'
|
||
signed_number: NUMBER | '-' NUMBER
|
||
attr: name_or_attr '.' NAME
|
||
name_or_attr: attr | NAME
|
||
values_pattern: ','.value_pattern+ ','?
|
||
items_pattern: ','.key_value_pattern+ ','?
|
||
keyword_pattern: NAME '=' or_pattern
|
||
value_pattern: '*' capture_pattern | pattern
|
||
key_value_pattern:
|
||
| (literal_pattern | constant_pattern) ':' or_pattern
|
||
| '**' capture_pattern
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document is placed in the public domain or under the
|
||
CC0-1.0-Universal license, whichever is more permissive.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|