2020-06-23 11:27:36 -04:00
|
|
|
|
PEP: 622
|
|
|
|
|
Title: Structural Pattern Matching
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Brandt Bucher <brandtbucher@gmail.com>,
|
2020-07-07 19:00:55 -04:00
|
|
|
|
Daniel F Moisset <dfmoisset@gmail.com>,
|
2020-06-23 11:27:36 -04:00
|
|
|
|
Tobias Kohn <kohnt@tobiaskohn.ch>,
|
|
|
|
|
Ivan Levkivskyi <levkivskyi@gmail.com>,
|
|
|
|
|
Guido van Rossum <guido@python.org>,
|
|
|
|
|
Talin <viridia@gmail.com>
|
|
|
|
|
BDFL-Delegate:
|
|
|
|
|
Discussions-To: Python-Dev <python-dev@python.org>
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 23-Jun-2020
|
|
|
|
|
Python-Version: 3.10
|
2020-07-07 19:00:55 -04:00
|
|
|
|
Post-History: 23-Jun-2020, 8-Jul-2020
|
2020-06-23 11:27:36 -04:00
|
|
|
|
Resolution:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
2020-07-07 13:32:01 -04:00
|
|
|
|
This PEP proposes to add a pattern matching statement to Python,
|
|
|
|
|
inspired by similar syntax found in Scala and many other languages.
|
|
|
|
|
|
|
|
|
|
The pattern syntax builds on Python’s existing syntax for sequence
|
|
|
|
|
unpacking (e.g., ``a, b = value``), but is wrapped in a ``match``
|
|
|
|
|
statement which compares its subject to several different “shapes”
|
|
|
|
|
until one is found that fits. In addition to specifying the shape of a
|
|
|
|
|
sequence to be unpacked, patterns can also specify the shape to be a
|
|
|
|
|
mapping with specific keys, an instance of a given class with (optionally) specific
|
|
|
|
|
attributes, a specific value, or a wildcard. Patterns can be composed
|
|
|
|
|
in several ways.
|
|
|
|
|
|
|
|
|
|
Syntactically, a ``match`` statement contains a *subject* expression
|
|
|
|
|
and one or more ``case`` clauses, where each case clause specifies a
|
|
|
|
|
pattern (the overall shape to be matched), an optional “guard” (a
|
|
|
|
|
condition to be checked if the pattern matches), and a code block to
|
|
|
|
|
be executed if the case clause is selected.
|
|
|
|
|
|
|
|
|
|
The rest of the PEP motivates why we believe pattern matching makes a
|
|
|
|
|
good addition to Python, explains our design choices, and contains a
|
|
|
|
|
precise syntactic and runtime specification. We also give guidance for
|
|
|
|
|
static type checkers (and one small addition to the ``typing`` module)
|
|
|
|
|
and discuss the main objections and alternatives that have been
|
|
|
|
|
brought up during extensive discussion of the proposal, both within
|
|
|
|
|
the group of authors and in the python-dev community. Finally, we
|
|
|
|
|
discuss some possible extensions that might be considered in the
|
|
|
|
|
future, once the community has ample experience with the currently
|
|
|
|
|
proposed syntax and semantics.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Overview
|
|
|
|
|
========
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-07 13:32:01 -04:00
|
|
|
|
Since patterns are a new syntactical category with their own rules
|
|
|
|
|
and special cases, and since they mix input (given values) and output
|
|
|
|
|
(captured variables) in novel ways, they require a bit of getting used
|
|
|
|
|
to. It is the experience of the authors that this happens quickly when
|
|
|
|
|
a brief introduction to the basic concepts such as the following is
|
|
|
|
|
presented. Note that this section is not intended to be complete or
|
|
|
|
|
perfectly accurate.
|
|
|
|
|
|
|
|
|
|
A new syntactic construct called *pattern* is
|
|
|
|
|
introduced. Syntactically, patterns look like a subset of expressions;
|
|
|
|
|
the following are patterns:
|
|
|
|
|
|
|
|
|
|
- ``[first, second, *rest]``
|
|
|
|
|
- ``Point2d(x, 0)``
|
|
|
|
|
- ``{"name": "Bruce", "age": age}``
|
|
|
|
|
- ``42``
|
|
|
|
|
|
|
|
|
|
The above look like examples of object construction. A constructor
|
|
|
|
|
takes some values as parameters and builds an object from those
|
|
|
|
|
components. But as a pattern the above mean the inverse operation of
|
|
|
|
|
construction, which we call *destructuring*: it takes a subject value
|
|
|
|
|
and extracts its components. The syntactic similarity between
|
|
|
|
|
construction and destructuring is intentional and follows the existing
|
|
|
|
|
Pythonic style which makes assignment targets (write contexts) look
|
|
|
|
|
like expressions (read contexts). Pattern matching never creates
|
|
|
|
|
objects, in the same way that ``[a, b] = my_list`` doesn't create a
|
|
|
|
|
new ``[a, b]`` list, nor reads the values of ``a`` and ``b``.
|
|
|
|
|
|
|
|
|
|
The intuition we are trying to build in users as they learn this is
|
|
|
|
|
that matching a pattern to a subject binds the free variables (if any)
|
|
|
|
|
to subject components in a way that reflects the original
|
|
|
|
|
subject when read as an expression. During this process,
|
|
|
|
|
the structure of the pattern may not fit the subject, in which case
|
|
|
|
|
the matching *fails*. For example, matching the pattern ``Point2d(x,
|
|
|
|
|
0)`` to the subject ``Point2d(3, 0)`` successfully matches and binds
|
|
|
|
|
``x`` to ``3``. However, if the subject is ``[3, 0]`` the match fails
|
|
|
|
|
because a ``list`` is not a ``Point2d``. And if the subject is
|
|
|
|
|
``Point2D(3, 3)`` the match fails because its second coordinate is not
|
|
|
|
|
``0``.
|
|
|
|
|
|
|
|
|
|
The ``match`` statement tries to match each of the
|
|
|
|
|
patterns in its ``case`` clauses with a single subject. At the first
|
|
|
|
|
successful match, the variables in the pattern are assigned and a
|
|
|
|
|
corresponding block is executed. Each of the multiple branches of this
|
|
|
|
|
conditional statement can also have a boolean condition as a *guard*.
|
|
|
|
|
|
|
|
|
|
Here's an example of a match statement, used to define a function
|
|
|
|
|
building 3D points that can accept as input either tuples of size 2 or
|
|
|
|
|
3, or existing (2D or 3D) points::
|
|
|
|
|
|
|
|
|
|
def make_point_3d(pt):
|
|
|
|
|
match pt:
|
|
|
|
|
case (x, y):
|
|
|
|
|
return Point3d(x, y, 0)
|
|
|
|
|
case (x, y, z):
|
|
|
|
|
return Point3d(x, y, z)
|
|
|
|
|
case Point2d(x, y):
|
|
|
|
|
return Point3d(x, y, 0)
|
|
|
|
|
case Point3d(_, _, _):
|
|
|
|
|
return pt
|
|
|
|
|
case _:
|
|
|
|
|
raise TypeError("not a point we support")
|
|
|
|
|
|
|
|
|
|
Writing this function in the traditional fashion would require several
|
|
|
|
|
``isinstance()`` checks, one or two ``len()`` calls, and a more
|
|
|
|
|
convoluted control flow. While the ``match`` version translates into
|
|
|
|
|
similar code under the hood, to a reader familiar with patterns it is
|
|
|
|
|
much clearer.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale and Goals
|
|
|
|
|
===================
|
|
|
|
|
|
2020-07-07 13:32:01 -04:00
|
|
|
|
Python programs frequently need to handle data which varies in type,
|
|
|
|
|
presence of attributes/keys, or number of elements. Typical examples
|
|
|
|
|
are operating on nodes of a mixed structure like an AST, handling UI
|
|
|
|
|
events of different types, processing structured input (like
|
|
|
|
|
structured files or network messages), or “parsing” arguments for a
|
|
|
|
|
function that can accept different combinations of types and numbers
|
|
|
|
|
of parameters. In fact, the classic 'visitor' pattern is an example of this,
|
|
|
|
|
done in an OOP style -- but matching makes it much less tedious to write.
|
|
|
|
|
|
|
|
|
|
Much of the code to do so tends to consist of complex chains of nested
|
|
|
|
|
``if``/``elif`` statements, including multiple calls to ``len()``,
|
|
|
|
|
``isinstance()`` and index/key/attribute access. Inside those branches
|
|
|
|
|
users sometimes need to destructure the data further to extract the
|
|
|
|
|
required component values, which may be nested several objects deep.
|
|
|
|
|
|
|
|
|
|
Pattern matching as present in many other languages provides an
|
|
|
|
|
elegant solution to this problem. These range from statically compiled
|
|
|
|
|
functional languages like F# and Haskell, via mixed-paradigm languages
|
|
|
|
|
like Scala [4]_ and Rust [3]_, to dynamic languages like Elixir and
|
|
|
|
|
Ruby, and is under consideration for JavaScript. We are indebted to
|
|
|
|
|
these languages for guiding the way to Pythonic pattern matching, as
|
|
|
|
|
Python is indebted to so many other languages for many of its
|
|
|
|
|
features: many basic syntactic features were inherited from C,
|
|
|
|
|
exceptions from Modula-3, classes were inspired by C++, slicing came
|
|
|
|
|
from Icon, regular expressions from Perl, decorators resemble Java
|
|
|
|
|
annotations, and so on.
|
|
|
|
|
|
|
|
|
|
The usual logic for operating on heterogeneous data can be summarized
|
|
|
|
|
in the following way:
|
|
|
|
|
|
|
|
|
|
- Some analysis is done on the *shape* (type and components) of the
|
|
|
|
|
data: This could involve ``isinstance()`` or ``len()`` calls and/or extracting
|
|
|
|
|
components (via indexing or attribute access) which are checked for
|
|
|
|
|
specific values or conditions.
|
|
|
|
|
- If the shape is as expected, some more components are possibly
|
|
|
|
|
extracted and some operation is done using the extracted values.
|
|
|
|
|
|
|
|
|
|
Take for example `this piece of the Django web framework
|
|
|
|
|
<https://github.com/django/django/blob/5166097d7c80cab757e44f2d02f3d148fbbc2ff6/django/db/models/enums.py#L13>`_::
|
|
|
|
|
|
|
|
|
|
if (
|
|
|
|
|
isinstance(value, (list, tuple)) and
|
|
|
|
|
len(value) > 1 and
|
|
|
|
|
isinstance(value[-1], (Promise, str))
|
|
|
|
|
):
|
|
|
|
|
*value, label = value
|
|
|
|
|
value = tuple(value)
|
|
|
|
|
else:
|
|
|
|
|
label = key.replace('_', ' ').title()
|
|
|
|
|
|
|
|
|
|
We can see the shape analysis of the ``value`` at the top, following
|
|
|
|
|
by the destructuring inside.
|
|
|
|
|
|
|
|
|
|
Note that shape analysis here involves checking the types both of the
|
|
|
|
|
container and of one of its components, and some checks on its number
|
|
|
|
|
of elements. Once we match the shape, we need to decompose the
|
|
|
|
|
sequence. With the proposal in this PEP, we could rewrite that code
|
|
|
|
|
into this::
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case [*v, label := (Promise() | str())]:
|
|
|
|
|
value = tuple(v)
|
|
|
|
|
case _:
|
|
|
|
|
label = key.replace('_', ' ').title()
|
|
|
|
|
|
|
|
|
|
This syntax makes much more explicit which formats are possible for
|
|
|
|
|
the input data, and which components are extracted from where. You can
|
|
|
|
|
see a pattern similar to list unpacking, but also type checking: the
|
|
|
|
|
``Promise()`` pattern is not an object construction, but represents
|
|
|
|
|
anything that's an instance of ``Promise``. The pattern operator ``|``
|
|
|
|
|
separates alternative patterns (not unlike regular expressions or EBNF
|
2020-07-09 18:10:05 -04:00
|
|
|
|
grammars), and ``_`` is a wildcard. (Note that the match syntax used
|
|
|
|
|
here will accept user-defined sequences, as well as lists and tuples.)
|
2020-07-07 13:32:01 -04:00
|
|
|
|
|
|
|
|
|
In some occasions, extraction of information is not as relevant as
|
2020-07-07 22:35:05 -04:00
|
|
|
|
identifying structure. Take the following example from the
|
2020-07-07 13:32:01 -04:00
|
|
|
|
`Python standard library
|
|
|
|
|
<https://github.com/python/cpython/blob/c4cacc8/Lib/lib2to3/fixer_util.py#L158>`_::
|
|
|
|
|
|
|
|
|
|
def is_tuple(node):
|
|
|
|
|
if isinstance(node, Node) and node.children == [LParen(), RParen()]:
|
|
|
|
|
return True
|
|
|
|
|
return (isinstance(node, Node)
|
|
|
|
|
and len(node.children) == 3
|
|
|
|
|
and isinstance(node.children[0], Leaf)
|
|
|
|
|
and isinstance(node.children[1], Node)
|
|
|
|
|
and isinstance(node.children[2], Leaf)
|
|
|
|
|
and node.children[0].value == "("
|
|
|
|
|
and node.children[2].value == ")")
|
|
|
|
|
|
|
|
|
|
This example shows an example of finding out the "shape" of the data
|
|
|
|
|
without doing significant extraction. This code is not very easy to
|
|
|
|
|
read, and the intended shape that this is trying to match is not
|
|
|
|
|
evident. Compare with the updated code using the proposed syntax::
|
|
|
|
|
|
|
|
|
|
def is_tuple(node: Node) -> bool:
|
|
|
|
|
match node:
|
|
|
|
|
case Node(children=[LParen(), RParen()]):
|
|
|
|
|
return True
|
|
|
|
|
case Node(children=[Leaf(value="("), Node(), Leaf(value=")")]):
|
|
|
|
|
return True
|
|
|
|
|
case _:
|
|
|
|
|
return False
|
|
|
|
|
|
|
|
|
|
Note that the proposed code will work without any modifications to the
|
|
|
|
|
definition of ``Node`` and other classes here. As shown in the
|
|
|
|
|
examples above, the proposal supports not just unpacking sequences, but
|
|
|
|
|
also doing ``isinstance`` checks (like ``LParen()`` or ``str()``),
|
|
|
|
|
looking into object attributes (``Leaf(value="(")`` for example) and
|
|
|
|
|
comparisons with literals.
|
|
|
|
|
|
|
|
|
|
That last feature helps with some kinds of code which look more like
|
|
|
|
|
the "switch" statement as present in other languages::
|
|
|
|
|
|
|
|
|
|
match response.status:
|
|
|
|
|
case 200:
|
|
|
|
|
do_something(response.data) # OK
|
|
|
|
|
case 301 | 302:
|
|
|
|
|
retry(response.location) # Redirect
|
|
|
|
|
case 401:
|
|
|
|
|
retry(auth=get_credentials()) # Login first
|
|
|
|
|
case 426:
|
|
|
|
|
sleep(DELAY) # Server is swamped, try after a bit
|
|
|
|
|
retry()
|
|
|
|
|
case _:
|
2020-07-09 16:36:20 -04:00
|
|
|
|
raise RequestError("we couldn't get the data")
|
2020-07-07 13:32:01 -04:00
|
|
|
|
|
|
|
|
|
Although this will work, it's not necessarily what the proposal is
|
|
|
|
|
focused on, and the new syntax has been designed to best support the
|
|
|
|
|
destructuring scenarios.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
See the `syntax`_ sections below for a more detailed specification.
|
|
|
|
|
|
2020-07-07 13:32:01 -04:00
|
|
|
|
We propose that destructuring objects can be customized by a new
|
|
|
|
|
special ``__match_args__`` attribute. As part of this PEP we specify
|
|
|
|
|
the general API and its implementation for some standard library
|
|
|
|
|
classes (including named tuples and dataclasses). See the `runtime`_
|
|
|
|
|
section below.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-07 13:32:01 -04:00
|
|
|
|
Finally, we aim to provide comprehensive support for static type
|
|
|
|
|
checkers and similar tools. For this purpose, we propose to introduce
|
|
|
|
|
a ``@typing.sealed`` class decorator that will be a no-op at runtime
|
|
|
|
|
but will indicate to static tools that all sub-classes of this class
|
|
|
|
|
must be defined in the same module. This will allow effective static
|
|
|
|
|
exhaustiveness checks, and together with dataclasses, will provide
|
|
|
|
|
basic support for algebraic data types [2]_. See the `static checkers`_
|
|
|
|
|
section for more details.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _syntax:
|
|
|
|
|
|
|
|
|
|
Syntax and Semantics
|
|
|
|
|
====================
|
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
Patterns
|
|
|
|
|
--------
|
|
|
|
|
|
|
|
|
|
The **pattern** is a new syntactical construct, that could be considered a loose
|
|
|
|
|
generalization of assignment targets. The key properties of a pattern are what
|
|
|
|
|
types and shapes of subjects it accepts, what variables it captures and how
|
|
|
|
|
it extracts them from the subject. For example the pattern ``[a, b]`` matches
|
|
|
|
|
only sequences of exactly 2 elements, extracting the first element into ``a``
|
|
|
|
|
and the second one into ``b``.
|
|
|
|
|
|
|
|
|
|
This PEP defines several types of patterns. These are certainly not the
|
|
|
|
|
only possible ones, so the design decision was made to choose a subset of
|
|
|
|
|
functionality that is useful now but conservative. More patterns can be added
|
|
|
|
|
later as this feature gets more widespread use. See the `rejected ideas`_
|
|
|
|
|
and `deferred ideas`_ sections for more details.
|
|
|
|
|
|
2020-07-07 22:35:05 -04:00
|
|
|
|
The patterns listed here are described in more detail below, but summarized
|
2020-07-07 19:00:55 -04:00
|
|
|
|
together in this section for simplicity:
|
|
|
|
|
|
|
|
|
|
- A **literal pattern** is useful to filter constant values in a structure.
|
2020-07-07 22:35:05 -04:00
|
|
|
|
It looks like a Python literal (including some values like ``True``,
|
2020-07-07 19:00:55 -04:00
|
|
|
|
``False`` and ``None``). It only matches objects equal to the literal, and
|
|
|
|
|
never binds.
|
|
|
|
|
- A **capture pattern** looks like ``x`` and is equivalent to an identical
|
|
|
|
|
assignment target: it always matches and binds the variable
|
2020-07-08 16:02:37 -04:00
|
|
|
|
with the given (simple) name.
|
2020-07-07 19:00:55 -04:00
|
|
|
|
- The **wildcard pattern** is a single underscore: ``_``. It always matches,
|
|
|
|
|
but does not capture any variable (which prevents interference with other
|
2020-07-07 22:35:05 -04:00
|
|
|
|
uses for ``_`` and allows for some optimizations).
|
2020-07-07 19:00:55 -04:00
|
|
|
|
- A **constant value pattern** works like the literal but for certain named
|
|
|
|
|
constants. Note that it must be a qualified (dotted) name, given the possible
|
|
|
|
|
ambiguity with a capture pattern. It looks like ``Color.RED`` and
|
|
|
|
|
only matches values equal to the corresponding value. It never binds.
|
2020-07-07 22:35:05 -04:00
|
|
|
|
- A **sequence pattern** looks like ``[a, *rest, b]`` and is similar to
|
2020-07-07 19:00:55 -04:00
|
|
|
|
a list unpacking. An important difference is that the elements nested
|
|
|
|
|
within it can be any kind of patterns, not just names or sequences.
|
|
|
|
|
It matches only sequences of appropriate length, as long as all the sub-patterns
|
|
|
|
|
also match. It makes all the bindings of its sub-patterns.
|
|
|
|
|
- A **mapping pattern** looks like ``{"user": u, "emails": [*es]}``. It matches
|
|
|
|
|
mappings with at least the set of provided keys, and if all the
|
|
|
|
|
sub-patterns match their corresponding values. It binds whatever the
|
|
|
|
|
sub-patterns bind while matching with the values corresponding to the keys.
|
|
|
|
|
Adding ``**rest`` at the end of the pattern to capture extra items is allowed.
|
|
|
|
|
- A **class pattern** is similar to the above but matches attributes instead
|
|
|
|
|
of keys. It looks like ``datetime.date(year=y, day=d)``. It matches
|
|
|
|
|
instances of the given type, having at least the specified
|
|
|
|
|
attributes, as long as the attributes match with the corresponding
|
|
|
|
|
sub-patterns. It binds whatever the sub-patterns bind when matching with the
|
|
|
|
|
values of
|
|
|
|
|
the given attributes. An optional protocol also allows matching positional
|
|
|
|
|
arguments.
|
|
|
|
|
- An **OR pattern** looks like ``[*x] | {"elems": [*x]}``. It matches if any
|
|
|
|
|
of its sub-patterns match. It uses the binding for the leftmost pattern
|
|
|
|
|
that matched.
|
|
|
|
|
- A **walrus pattern** looks like ``d := datetime(year=2020, month=m)``. It
|
|
|
|
|
matches only
|
|
|
|
|
if its sub-pattern also matches. It binds whatever the sub-pattern match does, and
|
|
|
|
|
also binds the named variable to the entire object.
|
|
|
|
|
|
|
|
|
|
The ``match`` statement
|
|
|
|
|
-----------------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
A simplified, approximate grammar for the proposed syntax is::
|
|
|
|
|
|
|
|
|
|
...
|
|
|
|
|
compound_statement:
|
|
|
|
|
| if_stmt
|
|
|
|
|
...
|
|
|
|
|
| match_stmt
|
|
|
|
|
match_stmt: "match" expression ':' NEWLINE INDENT case_block+ DEDENT
|
|
|
|
|
case_block: "case" pattern [guard] ':' block
|
|
|
|
|
guard: 'if' expression
|
2020-06-30 23:26:10 -04:00
|
|
|
|
pattern: walrus_pattern | or_pattern
|
|
|
|
|
walrus_pattern: NAME ':=' or_pattern
|
2020-06-23 11:27:36 -04:00
|
|
|
|
or_pattern: closed_pattern ('|' closed_pattern)*
|
|
|
|
|
closed_pattern:
|
|
|
|
|
| literal_pattern
|
2020-06-29 15:11:20 -04:00
|
|
|
|
| capture_pattern
|
2020-07-07 19:00:55 -04:00
|
|
|
|
| wildcard_pattern
|
2020-06-23 11:27:36 -04:00
|
|
|
|
| constant_pattern
|
|
|
|
|
| sequence_pattern
|
|
|
|
|
| mapping_pattern
|
|
|
|
|
| class_pattern
|
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
See `Appendix A`_ for the full, unabridged grammar. The simplified grammars in
|
|
|
|
|
this section are there for helping the reader, not as a full specification.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
We propose that the match operation should be a statement, not an expression.
|
|
|
|
|
Although in
|
2020-07-09 18:02:38 -04:00
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
many languages it is an expression, being a statement better suits the general
|
2020-07-09 18:02:38 -04:00
|
|
|
|
logic of Python syntax. See `rejected ideas`_ for more discussion.
|
|
|
|
|
The allowed patterns are described in detail below in the `patterns`_
|
|
|
|
|
subsection.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-06-23 19:28:16 -04:00
|
|
|
|
The ``match`` and ``case`` keywords are proposed to be soft keywords,
|
|
|
|
|
so that they are recognized as keywords at the beginning of a match
|
|
|
|
|
statement or case block respectively, but are allowed to be used in
|
|
|
|
|
other places as variable or argument names.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
The proposed indentation structure is as following::
|
|
|
|
|
|
|
|
|
|
match some_expression:
|
|
|
|
|
case pattern_1:
|
|
|
|
|
...
|
|
|
|
|
case pattern_2:
|
|
|
|
|
...
|
|
|
|
|
|
2020-07-06 00:29:52 -04:00
|
|
|
|
Here, `some_expression` represents the value that is being matched against,
|
|
|
|
|
which will be referred to hereafter as the *subject* of the match.
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Match semantics
|
|
|
|
|
---------------
|
|
|
|
|
|
|
|
|
|
The proposed large scale semantics for choosing the match is to choose the first
|
|
|
|
|
matching pattern and execute the corresponding suite. The remaining patterns
|
2020-06-24 01:15:47 -04:00
|
|
|
|
are not tried. If there are no matching patterns, the statement 'falls
|
2020-06-23 11:27:36 -04:00
|
|
|
|
through', and execution continues at the following statement.
|
|
|
|
|
|
|
|
|
|
Essentially this is equivalent to a chain of ``if ... elif ... else``
|
|
|
|
|
statements. Note that unlike for the previously proposed ``switch`` statement,
|
|
|
|
|
the pre-computed dispatch dictionary semantics does not apply here.
|
|
|
|
|
|
|
|
|
|
There is no ``default`` or ``else`` case - instead the special wildcard
|
2020-06-29 15:11:20 -04:00
|
|
|
|
``_`` can be used (see the section on `capture_pattern`_) as a final
|
2020-06-23 11:27:36 -04:00
|
|
|
|
'catch-all' pattern.
|
|
|
|
|
|
|
|
|
|
Name bindings made during a successful pattern match outlive the executed suite
|
|
|
|
|
and can be used after the match statement. This follows the logic of other
|
|
|
|
|
Python statements that can bind names, such as ``for`` loop and ``with``
|
|
|
|
|
statement. For example::
|
|
|
|
|
|
|
|
|
|
match shape:
|
|
|
|
|
case Point(x, y):
|
|
|
|
|
...
|
|
|
|
|
case Rectangle(x, y, _, _):
|
|
|
|
|
...
|
|
|
|
|
print(x, y) # This works
|
|
|
|
|
|
2020-07-02 19:11:05 -04:00
|
|
|
|
During failed pattern matches, some sub-patterns may succeed. For example,
|
|
|
|
|
while matching the value ``[0, 1, 2]`` with the pattern ``(0, x, 1)``, the
|
|
|
|
|
sub-pattern `x` may succeed if the list elements are matched from left to right.
|
|
|
|
|
The implementation may choose to either make persistent bindings for those
|
|
|
|
|
partial matches or not. User code including a `match` statement should not rely
|
|
|
|
|
on the bindings being made for a failed match, but also shouldn't assume that
|
|
|
|
|
variables are unchanged by a failed match. This part of the behavior is
|
|
|
|
|
left intentionally unspecified so different implementations can add
|
|
|
|
|
optimizations, and to prevent introducing semantic restrictions that could
|
|
|
|
|
limit the extensibility of this feature.
|
|
|
|
|
|
|
|
|
|
Note that some pattern types below define more specific rules about when
|
|
|
|
|
the binding is made.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
.. _patterns:
|
|
|
|
|
|
|
|
|
|
Allowed patterns
|
|
|
|
|
----------------
|
|
|
|
|
|
|
|
|
|
We introduce the proposed syntax gradually. Here we start from the main
|
|
|
|
|
building blocks. The following patterns are supported:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _literal_pattern:
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
Literal Patterns
|
|
|
|
|
~~~~~~~~~~~~~~~~
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
Simplified syntax::
|
|
|
|
|
|
|
|
|
|
literal_pattern:
|
|
|
|
|
| number
|
|
|
|
|
| string
|
|
|
|
|
| 'None'
|
|
|
|
|
| 'True'
|
|
|
|
|
| 'False'
|
|
|
|
|
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
A literal pattern consists of a simple literal like a string, a number,
|
2020-06-24 20:06:42 -04:00
|
|
|
|
a Boolean literal (``True`` or ``False``), or ``None``::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
match number:
|
|
|
|
|
case 0:
|
|
|
|
|
print("Nothing")
|
|
|
|
|
case 1:
|
|
|
|
|
print("Just one")
|
|
|
|
|
case 2:
|
|
|
|
|
print("A couple")
|
|
|
|
|
case -1:
|
|
|
|
|
print("One less than nothing")
|
|
|
|
|
case 1-1j:
|
|
|
|
|
print("Good luck with that...")
|
|
|
|
|
|
|
|
|
|
Literal pattern uses equality with literal on the right hand side, so that
|
2020-07-04 17:04:43 -04:00
|
|
|
|
in the above example ``number == 0`` and then possibly ``number == 1``, etc
|
|
|
|
|
will be evaluated. Note that although technically negative numbers
|
2020-06-23 11:27:36 -04:00
|
|
|
|
are represented using unary minus, they are considered
|
|
|
|
|
literals for the purpose of pattern matching. Unary plus is not allowed.
|
|
|
|
|
Binary plus and minus are allowed only to join a real number and an imaginary
|
|
|
|
|
number to form a complex number, such as ``1+1j``.
|
|
|
|
|
|
2020-06-24 20:06:42 -04:00
|
|
|
|
Note that because equality (``__eq__``) is used, and the equivalency
|
|
|
|
|
between Booleans and the integers ``0`` and ``1``, there is no
|
|
|
|
|
practical difference between the following two::
|
|
|
|
|
|
|
|
|
|
case True:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
case 1:
|
|
|
|
|
...
|
|
|
|
|
|
2020-06-23 19:28:16 -04:00
|
|
|
|
Triple-quoted strings are supported. Raw strings and byte strings
|
2020-06-23 11:27:36 -04:00
|
|
|
|
are supported. F-strings are not allowed (since in general they are not
|
|
|
|
|
really literals).
|
|
|
|
|
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
.. _capture_pattern:
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
Capture Patterns
|
|
|
|
|
~~~~~~~~~~~~~~~~
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
Simplified syntax::
|
|
|
|
|
|
|
|
|
|
capture_pattern: NAME
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
A capture pattern serves as an assignment target for the matched expression::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
match greeting:
|
|
|
|
|
case "":
|
|
|
|
|
print("Hello!")
|
|
|
|
|
case name:
|
|
|
|
|
print(f"Hi {name}!")
|
|
|
|
|
|
2020-07-08 16:02:37 -04:00
|
|
|
|
Only a single name is allowed (a dotted name is a constant value pattern).
|
2020-06-29 15:11:20 -04:00
|
|
|
|
A capture pattern always succeeds. A capture pattern appearing in a scope makes
|
2020-06-23 11:27:36 -04:00
|
|
|
|
the name local to that scope. For example, using ``name`` after the above
|
|
|
|
|
snippet may raise ``UnboundLocalError`` rather than ``NameError``, if
|
|
|
|
|
the ``""`` case clause was taken::
|
|
|
|
|
|
|
|
|
|
match greeting:
|
|
|
|
|
case "":
|
|
|
|
|
print("Hello!")
|
|
|
|
|
case name:
|
|
|
|
|
print(f"Hi {name}!")
|
|
|
|
|
if name == "Santa": # <-- might raise UnboundLocalError
|
|
|
|
|
... # but works fine if greeting was not empty
|
|
|
|
|
|
|
|
|
|
While matching against each case clause, a name may be bound at most
|
2020-07-07 22:35:05 -04:00
|
|
|
|
once, having two capture patterns with coinciding names is an error::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
match data:
|
|
|
|
|
case [x, x]: # Error!
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
Note: one can still match on a collection with equal items using `guards`_.
|
|
|
|
|
Also, ``[x, y] | Point(x, y)`` is a legal pattern because the two
|
|
|
|
|
alternatives are never matched at the same time.
|
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
The single underscore (``_``) is not considered a ``NAME`` and treated specially
|
2020-07-07 22:35:05 -04:00
|
|
|
|
as a `wildcard pattern`_.
|
2020-07-07 19:00:55 -04:00
|
|
|
|
|
2020-06-24 21:20:19 -04:00
|
|
|
|
Reminder: ``None``, ``False`` and ``True`` are keywords denoting
|
|
|
|
|
literals, not names.
|
2020-06-24 20:06:42 -04:00
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
.. _wildcard_pattern:
|
|
|
|
|
|
|
|
|
|
Wildcard Pattern
|
|
|
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
|
|
Simplified syntax::
|
|
|
|
|
|
|
|
|
|
wildcard_pattern: "_"
|
|
|
|
|
|
|
|
|
|
The single underscore (``_``) name is a special kind of pattern that always
|
|
|
|
|
matches but *never* binds::
|
|
|
|
|
|
|
|
|
|
match data:
|
|
|
|
|
case [_, _]:
|
|
|
|
|
print("Some pair")
|
|
|
|
|
print(_) # Error!
|
|
|
|
|
|
|
|
|
|
Given that no binding is made, it can be used as many times as desired, unlike
|
|
|
|
|
capture patterns.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
.. _constant_value_pattern:
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
Constant Value Patterns
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
Simplified syntax::
|
|
|
|
|
|
|
|
|
|
constant_pattern: NAME ('.' NAME)+
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
This is used to match against constants and enum values.
|
|
|
|
|
Every dotted name in a pattern is looked up using normal Python name
|
|
|
|
|
resolution rules, and the value is used for comparison by equality with
|
2020-07-06 19:37:45 -04:00
|
|
|
|
the match subject (same as for literals)::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
from enum import Enum
|
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
class Sides(str, Enum):
|
|
|
|
|
SPAM = "Spam"
|
|
|
|
|
EGGS = "eggs"
|
|
|
|
|
...
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
match entree[-1]:
|
|
|
|
|
case Sides.SPAM: # Compares entree[-1] == Sides.SPAM.
|
|
|
|
|
response = "Have you got anything without Spam?"
|
|
|
|
|
case side: # Assigns side = entree[-1].
|
|
|
|
|
response = f"Well, could I have their Spam instead of the {side} then?"
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
Note that there is no way to use unqualified names as constant value
|
|
|
|
|
patterns (they always denote variables to be captured). See
|
|
|
|
|
`rejected ideas`_ for other syntactic alternatives that were
|
|
|
|
|
considered for constant value patterns.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _sequence_pattern:
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
Sequence Patterns
|
|
|
|
|
~~~~~~~~~~~~~~~~~
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
Simplified syntax::
|
|
|
|
|
|
|
|
|
|
sequence_pattern:
|
2020-07-07 22:35:05 -04:00
|
|
|
|
| '[' [values_pattern] ']'
|
|
|
|
|
| '(' [value_pattern ',' [values pattern]] ')'
|
2020-07-07 19:00:55 -04:00
|
|
|
|
values_pattern: ','.value_pattern+ ','?
|
|
|
|
|
value_pattern: '*' capture_pattern | pattern
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
A sequence pattern follows the same semantics as unpacking assignment.
|
|
|
|
|
Like unpacking assignment, both tuple-like and list-like syntax can be
|
|
|
|
|
used, with identical semantics. Each element can be an arbitrary
|
|
|
|
|
pattern; there may also be at most one ``*name`` pattern to catch all
|
|
|
|
|
remaining items::
|
|
|
|
|
|
|
|
|
|
match collection:
|
|
|
|
|
case 1, [x, *others]:
|
|
|
|
|
print("Got 1 and a nested sequence")
|
|
|
|
|
case (1, x):
|
|
|
|
|
print(f"Got 1 and {x}")
|
|
|
|
|
|
2020-07-06 00:29:52 -04:00
|
|
|
|
To match a sequence pattern the subject must be an instance of
|
2020-06-23 11:27:36 -04:00
|
|
|
|
``collections.abc.Sequence``, and it cannot be any kind of string
|
|
|
|
|
(``str``, ``bytes``, ``bytearray``). It cannot be an iterator. For matching
|
|
|
|
|
on a specific collection class, see class pattern below.
|
|
|
|
|
|
|
|
|
|
The ``_`` wildcard can be starred to match sequences of varying lengths. For
|
|
|
|
|
example:
|
|
|
|
|
|
|
|
|
|
* ``[*_]`` matches a sequence of any length.
|
|
|
|
|
* ``(_, _, *_)``, matches any sequence of length two or more.
|
|
|
|
|
* ``["a", *_, "z"]`` matches any sequence of length two or more that starts with
|
|
|
|
|
``"a"`` and ends with ``"z"``.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _mapping_pattern:
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
Mapping Patterns
|
|
|
|
|
~~~~~~~~~~~~~~~~
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
Simplified syntax::
|
|
|
|
|
|
|
|
|
|
mapping_pattern: '{' [items_pattern] '}'
|
|
|
|
|
items_pattern: ','.key_value_pattern+ ','?
|
|
|
|
|
key_value_pattern:
|
|
|
|
|
| (literal_pattern | constant_pattern) ':' or_pattern
|
|
|
|
|
| '**' capture_pattern
|
|
|
|
|
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
Mapping pattern is a generalization of iterable unpacking to mappings.
|
|
|
|
|
Its syntax is similar to dictionary display but each key and value are
|
|
|
|
|
patterns ``"{" (pattern ":" pattern)+ "}"``. A ``**name`` pattern is also
|
|
|
|
|
allowed, to extract the remaining items. Only literal and constant value
|
|
|
|
|
patterns are allowed in key positions::
|
|
|
|
|
|
|
|
|
|
import constants
|
|
|
|
|
|
|
|
|
|
match config:
|
|
|
|
|
case {"route": route}:
|
|
|
|
|
process_route(route)
|
|
|
|
|
case {constants.DEFAULT_PORT: sub_config, **rest}:
|
|
|
|
|
process_config(sub_config, rest)
|
|
|
|
|
|
2020-07-06 00:29:52 -04:00
|
|
|
|
The subject must be an instance of ``collections.abc.Mapping``.
|
|
|
|
|
Extra keys in the subject are ignored even if ``**rest`` is not present.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
This is different from sequence pattern, where extra items will cause a
|
|
|
|
|
match to fail. But mappings are actually different from sequences: they
|
|
|
|
|
have natural structural sub-typing behavior, i.e., passing a dictionary
|
|
|
|
|
with extra keys somewhere will likely just work.
|
|
|
|
|
|
|
|
|
|
For this reason, ``**_`` is invalid in mapping patterns; it would always be a
|
|
|
|
|
no-op that could be removed without consequence.
|
|
|
|
|
|
|
|
|
|
Matched key-value pairs must already be present in the mapping, and not created
|
|
|
|
|
on-the-fly by ``__missing__`` or ``__getitem__``. For example,
|
|
|
|
|
``collections.defaultdict`` instances will only match patterns with keys that
|
|
|
|
|
were already present when the ``match`` block was entered.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _class_pattern:
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
Class Patterns
|
|
|
|
|
~~~~~~~~~~~~~~
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
Simplified syntax::
|
|
|
|
|
|
|
|
|
|
class_pattern:
|
|
|
|
|
| name_or_attr '(' ')'
|
|
|
|
|
| name_or_attr '(' ','.pattern+ ','? ')'
|
|
|
|
|
| name_or_attr '(' ','.keyword_pattern+ ','? ')'
|
|
|
|
|
| name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'
|
|
|
|
|
keyword_pattern: NAME '=' or_pattern
|
|
|
|
|
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
A class pattern provides support for destructuring arbitrary objects.
|
|
|
|
|
There are two possible ways of matching on object attributes: by position
|
2020-06-24 20:04:34 -04:00
|
|
|
|
like ``Point(1, 2)``, and by name like ``Point(x=1, y=2)``. These
|
2020-06-23 11:27:36 -04:00
|
|
|
|
two can be combined, but positional match cannot follow a match by name.
|
|
|
|
|
Each item in a class pattern can be an arbitrary pattern. A simple
|
|
|
|
|
example::
|
|
|
|
|
|
|
|
|
|
match shape:
|
|
|
|
|
case Point(x, y):
|
|
|
|
|
...
|
|
|
|
|
case Rectangle(x0, y0, x1, y1, painted=True):
|
|
|
|
|
...
|
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
Whether a match succeeds or not is determined by the equivalent of an
|
2020-07-06 00:29:52 -04:00
|
|
|
|
``isinstance`` call. If the subject (``shape``, in the example) is not
|
2020-07-01 11:37:47 -04:00
|
|
|
|
an instance of the named class (``Point`` or ``Rectangle``), the match
|
|
|
|
|
fails. Otherwise, it continues (see details in the `runtime`_
|
|
|
|
|
section).
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
The named class must inherit from ``type``. It may be a single name
|
|
|
|
|
or a dotted name (e.g. ``some_mod.SomeClass`` or ``mod.pkg.Class``).
|
|
|
|
|
The leading name must not be ``_``, so e.g. ``_(...)`` and
|
|
|
|
|
``_.C(...)`` are invalid. Use ``object(foo=_)`` to check whether the
|
|
|
|
|
matched object has an attribute ``foo``.
|
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
By default, sub-patterns may only be matched by keyword for
|
2020-07-07 19:00:55 -04:00
|
|
|
|
user-defined classes. In order to support positional sub-patterns, a
|
2020-07-01 11:37:47 -04:00
|
|
|
|
custom ``__match_args__`` attribute is required.
|
|
|
|
|
The runtime allows matching against
|
|
|
|
|
arbitrarily nested patterns by chaining all of the instance checks and
|
|
|
|
|
attribute lookups appropriately.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
2020-07-02 01:11:53 -04:00
|
|
|
|
Combining multiple patterns (OR patterns)
|
|
|
|
|
-----------------------------------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Multiple alternative patterns can be combined into one using ``|``. This means
|
2020-06-24 20:25:13 -04:00
|
|
|
|
the whole pattern matches if at least one alternative matches.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
Alternatives are tried from left to right and have short-circuit property,
|
|
|
|
|
subsequent patterns are not tried if one matched. Examples::
|
|
|
|
|
|
|
|
|
|
match something:
|
|
|
|
|
case 0 | 1 | 2:
|
|
|
|
|
print("Small number")
|
|
|
|
|
case [] | [_]:
|
|
|
|
|
print("A short sequence")
|
|
|
|
|
case str() | bytes():
|
|
|
|
|
print("Something string-like")
|
|
|
|
|
case _:
|
|
|
|
|
print("Something else")
|
|
|
|
|
|
|
|
|
|
The alternatives may bind variables, as long as each alternative binds
|
|
|
|
|
the same set of variables (excluding ``_``). For example::
|
|
|
|
|
|
|
|
|
|
match something:
|
|
|
|
|
case 1 | x: # Error!
|
|
|
|
|
...
|
|
|
|
|
case x | 1: # Error!
|
|
|
|
|
...
|
|
|
|
|
case one := [1] | two := [2]: # Error!
|
|
|
|
|
...
|
|
|
|
|
case Foo(arg=x) | Bar(arg=x): # Valid, both arms bind 'x'
|
|
|
|
|
...
|
|
|
|
|
case [x] | x: # Valid, both arms bind 'x'
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _guards:
|
|
|
|
|
|
|
|
|
|
Guards
|
|
|
|
|
------
|
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
Each *top-level* pattern can be followed by a **guard** of the form
|
2020-06-23 11:27:36 -04:00
|
|
|
|
``if expression``. A case clause succeeds if the pattern matches and the guard
|
2020-06-23 18:20:24 -04:00
|
|
|
|
evaluates to a true value. For example::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
match input:
|
|
|
|
|
case [x, y] if x > MAX_INT and y > MAX_INT:
|
|
|
|
|
print("Got a pair of large numbers")
|
|
|
|
|
case x if x > MAX_INT:
|
|
|
|
|
print("Got a large number")
|
|
|
|
|
case [x, y] if x == y:
|
|
|
|
|
print("Got equal items")
|
|
|
|
|
case _:
|
|
|
|
|
print("Not an outstanding input")
|
|
|
|
|
|
|
|
|
|
If evaluating a guard raises an exception, it is propagated onwards rather
|
|
|
|
|
than fail the case clause. Names that appear in a pattern are bound before the
|
|
|
|
|
guard succeeds. So this will work::
|
|
|
|
|
|
|
|
|
|
values = [0]
|
|
|
|
|
|
2020-06-26 20:55:59 -04:00
|
|
|
|
match values:
|
2020-06-23 11:27:36 -04:00
|
|
|
|
case [x] if x:
|
|
|
|
|
... # This is not executed
|
|
|
|
|
case _:
|
|
|
|
|
...
|
|
|
|
|
print(x) # This will print "0"
|
|
|
|
|
|
|
|
|
|
Note that guards are not allowed for nested patterns, so that ``[x if x > 0]``
|
|
|
|
|
is a ``SyntaxError`` and ``1 | 2 if 3 | 4`` will be parsed as
|
|
|
|
|
``(1 | 2) if (3 | 4)``.
|
|
|
|
|
|
|
|
|
|
|
2020-06-30 23:26:10 -04:00
|
|
|
|
Walrus patterns
|
|
|
|
|
---------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-06-30 23:26:10 -04:00
|
|
|
|
It is often useful to match a sub-pattern *and* bind the corresponding
|
2020-06-23 11:27:36 -04:00
|
|
|
|
value to a name. For example, it can be useful to write more efficient
|
2020-06-30 23:26:10 -04:00
|
|
|
|
matches, or simply to avoid repetition. To simplify such cases, any pattern
|
|
|
|
|
(other than the walrus pattern itself) can be preceded by a name and
|
|
|
|
|
the walrus operator (``:=``). For example::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
match get_shape():
|
|
|
|
|
case Line(start := Point(x, y), end) if start == end:
|
|
|
|
|
print(f"Zero length line at {x}, {y}")
|
|
|
|
|
|
2020-06-30 23:26:10 -04:00
|
|
|
|
The name on the left of the walrus operator can be used in a guard, in
|
2020-06-23 11:27:36 -04:00
|
|
|
|
the match suite, or after the match statement. However, the name will
|
|
|
|
|
*only* be bound if the sub-pattern succeeds. Another example::
|
|
|
|
|
|
|
|
|
|
match group_shapes():
|
|
|
|
|
case [], [point := Point(x, y), *other]:
|
|
|
|
|
print(f"Got {point} in the second group")
|
|
|
|
|
process_coordinates(x, y)
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
Technically, most such examples can be rewritten using guards and/or nested
|
|
|
|
|
match statements, but this will be less readable and/or will produce less
|
|
|
|
|
efficient code. Essentially, most of the arguments in PEP 572 apply here
|
|
|
|
|
equally.
|
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
The wildcard ``_`` is not a valid name here.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _runtime:
|
|
|
|
|
|
|
|
|
|
Runtime specification
|
|
|
|
|
=====================
|
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
The Match Protocol
|
|
|
|
|
------------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
The equivalent of an ``isinstance`` call is used to decide whether an
|
|
|
|
|
object matches a given class pattern and to extract the corresponding
|
|
|
|
|
attributes. Classes requiring different matching semantics (such as
|
|
|
|
|
duck-typing) can do so by defining ``__instancecheck__`` (a
|
|
|
|
|
pre-existing metaclass hook) or by using ``typing.Protocol``.
|
2020-06-23 19:28:16 -04:00
|
|
|
|
|
|
|
|
|
The procedure is as following:
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
* The class object for ``Class`` in ``Class(<sub-patterns>)`` is
|
|
|
|
|
looked up and ``isinstance(obj, Class)`` is called, where ``obj`` is
|
|
|
|
|
the value being matched. If false, the match fails.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
* Otherwise, if any sub-patterns are given in the form of positional
|
|
|
|
|
or keyword arguments, these are matched from left to right, as
|
|
|
|
|
follows. The match fails as soon as a sub-pattern fails; if all
|
|
|
|
|
sub-patterns succeed, the overall class pattern match succeeds.
|
|
|
|
|
|
|
|
|
|
* If there are match-by-position items and the class has a
|
2020-06-26 20:55:59 -04:00
|
|
|
|
``__match_args__``, the item at position ``i``
|
2020-06-23 11:27:36 -04:00
|
|
|
|
is matched against the value looked up by attribute
|
|
|
|
|
``__match_args__[i]``. For example, a pattern ``Point2D(5, 8)``,
|
|
|
|
|
where ``Point2D.__match_args__ == ["x", "y"]``, is translated
|
|
|
|
|
(approximately) into ``obj.x == 5 and obj.y == 8``.
|
|
|
|
|
|
2020-06-29 14:54:42 -04:00
|
|
|
|
* If there are more positional items than the length of
|
|
|
|
|
``__match_args__``, a ``TypeError`` is raised.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-06-26 20:55:59 -04:00
|
|
|
|
* If the ``__match_args__`` attribute is absent on the matched class,
|
2020-06-29 12:32:21 -04:00
|
|
|
|
and one or more positional item appears in a match,
|
2020-06-29 14:54:42 -04:00
|
|
|
|
``TypeError`` is also raised. We don't fall back on
|
2020-06-23 11:27:36 -04:00
|
|
|
|
using ``__slots__`` or ``__annotations__`` -- "In the face of ambiguity,
|
|
|
|
|
refuse the temptation to guess."
|
|
|
|
|
|
|
|
|
|
* If there are any match-by-keyword items the keywords are looked up
|
2020-07-06 00:29:52 -04:00
|
|
|
|
as attributes on the subject. If the lookup succeeds the value is
|
2020-06-23 11:27:36 -04:00
|
|
|
|
matched against the corresponding sub-pattern. If the lookup fails,
|
2020-06-29 12:32:21 -04:00
|
|
|
|
the match fails.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Such a protocol favors simplicity of implementation over flexibility and
|
2020-06-28 23:30:08 -04:00
|
|
|
|
performance. For other considered alternatives, see `extended matching`_.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-06-27 15:22:24 -04:00
|
|
|
|
For the most commonly-matched built-in types (``bool``,
|
|
|
|
|
``bytearray``, ``bytes``, ``dict``, ``float``,
|
|
|
|
|
``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``), a
|
|
|
|
|
single positional sub-pattern is allowed to be passed to
|
2020-06-26 20:55:59 -04:00
|
|
|
|
the call. Rather than being matched against any particular attribute
|
2020-07-06 00:29:52 -04:00
|
|
|
|
on the subject, it is instead matched against the subject itself. This
|
2020-06-26 20:55:59 -04:00
|
|
|
|
creates behavior that is useful and intuitive for these objects:
|
|
|
|
|
|
|
|
|
|
* ``bool(False)`` matches ``False`` (but not ``0``).
|
|
|
|
|
* ``tuple((0, 1, 2))`` matches ``(0, 1, 2)`` (but not ``[0, 1, 2]``).
|
|
|
|
|
* ``int(i)`` matches any ``int`` and binds it to the name ``i``.
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-09 17:58:06 -04:00
|
|
|
|
Overlapping sub-patterns
|
|
|
|
|
------------------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-09 17:58:06 -04:00
|
|
|
|
Certain classes of overlapping matches are detected at
|
2020-06-29 14:54:42 -04:00
|
|
|
|
runtime and will raise exceptions. In addition to basic checks
|
|
|
|
|
described in the previous subsection:
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
* The interpreter will check that two match items are not targeting the same
|
|
|
|
|
attribute, for example ``Point2D(1, 2, y=3)`` is an error.
|
|
|
|
|
|
2020-06-29 14:54:42 -04:00
|
|
|
|
* It will also check that a mapping pattern does not attempt to match
|
|
|
|
|
the same key more than once.
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Special attribute ``__match_args__``
|
|
|
|
|
------------------------------------
|
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
The ``__match_args__`` attribute is always looked up on the type
|
|
|
|
|
object named in the pattern. If present, it must be a list or tuple
|
|
|
|
|
of strings naming the allowed positional arguments.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
In deciding what names should be available for matching, the
|
|
|
|
|
recommended practice is that class patterns should be the mirror of
|
|
|
|
|
construction; that is, the set of available names and their types
|
|
|
|
|
should resemble the arguments to ``__init__()``.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
Only match-by-name will work by default, and classes should define
|
|
|
|
|
``__match_args__`` as a class attribute if they would like to support
|
|
|
|
|
match-by-position. Additionally, dataclasses and named tuples will
|
|
|
|
|
support match-by-position out of the box. See below for more details.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-07 16:47:42 -04:00
|
|
|
|
Exceptions and side effects
|
|
|
|
|
---------------------------
|
2020-07-02 17:47:45 -04:00
|
|
|
|
|
|
|
|
|
While matching each case, the ``match`` statement may trigger execution of other
|
|
|
|
|
functions (for example ``__getitem__()``, ``__len__()`` or
|
|
|
|
|
a property). Almost every exception caused by those propagates outside of the
|
|
|
|
|
match statement normally. The only case where an exception is not propagated is
|
|
|
|
|
an ``AttributeError`` raised while trying to lookup an attribute while matching
|
|
|
|
|
attributes of a Class Pattern; that case results in just a matching failure,
|
|
|
|
|
and the rest of the statement proceeds normally.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-07 16:47:42 -04:00
|
|
|
|
The only side-effect carried on explicitly by the matching process is the binding of
|
|
|
|
|
names. However, the process relies on attribute access,
|
|
|
|
|
instance checks, ``len()``, equality and item access on the subject and some of
|
|
|
|
|
its components. It also evaluates constant value patterns and the left side of
|
|
|
|
|
class patterns. While none of those typically create any side-effects, some of
|
|
|
|
|
these objects could. This proposal intentionally leaves out any specification
|
|
|
|
|
of what methods are called or how many times. User code relying on that
|
|
|
|
|
behavior should be considered buggy.
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
The standard library
|
2020-06-23 19:28:16 -04:00
|
|
|
|
--------------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
To facilitate the use of pattern matching, several changes will be made to
|
|
|
|
|
the standard library:
|
|
|
|
|
|
|
|
|
|
* Namedtuples and dataclasses will have auto-generated ``__match_args__``.
|
|
|
|
|
|
|
|
|
|
* For dataclasses the order of attributes in the generated ``__match_args__``
|
|
|
|
|
will be the same as the order of corresponding arguments in the generated
|
|
|
|
|
``__init__()`` method. This includes the situations where attributes are
|
|
|
|
|
inherited from a superclass.
|
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
In addition, a systematic effort will be put into going through
|
|
|
|
|
existing standard library classes and adding ``__match_args__`` where
|
|
|
|
|
it looks beneficial.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _static checkers:
|
|
|
|
|
|
|
|
|
|
Static checkers specification
|
|
|
|
|
=============================
|
|
|
|
|
|
|
|
|
|
Exhaustiveness checks
|
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
From a reliability perspective, experience shows that missing a case when
|
|
|
|
|
dealing with a set of possible data values leads to hard to debug issues,
|
|
|
|
|
thus forcing people to add safety asserts like this::
|
|
|
|
|
|
|
|
|
|
def get_first(data: Union[int, list[int]]) -> int:
|
|
|
|
|
if isinstance(data, list) and data:
|
|
|
|
|
return data[0]
|
|
|
|
|
elif isinstance(data, int):
|
|
|
|
|
return data
|
|
|
|
|
else:
|
|
|
|
|
assert False, "should never get here"
|
|
|
|
|
|
|
|
|
|
PEP 484 specifies that static type checkers should support exhaustiveness in
|
|
|
|
|
conditional checks with respect to enum values. PEP 586 later generalized this
|
|
|
|
|
requirement to literal types.
|
|
|
|
|
|
|
|
|
|
This PEP further generalizes this requirement to
|
|
|
|
|
arbitrary patterns. A typical situation where this applies is matching an
|
|
|
|
|
expression with a union type::
|
|
|
|
|
|
|
|
|
|
def classify(val: Union[int, Tuple[int, int], List[int]]) -> str:
|
|
|
|
|
match val:
|
|
|
|
|
case [x, y] if x > 0 and y > 0:
|
|
|
|
|
return f"A pair of {x} and {y}"
|
2020-07-09 17:51:09 -04:00
|
|
|
|
case [x, *other]:
|
|
|
|
|
return f"A sequence starting with {x}"
|
2020-06-23 11:27:36 -04:00
|
|
|
|
case int():
|
|
|
|
|
return f"Some integer"
|
|
|
|
|
# Type-checking error: some cases unhandled.
|
|
|
|
|
|
|
|
|
|
The exhaustiveness checks should also apply where both pattern matching
|
|
|
|
|
and enum values are combined::
|
|
|
|
|
|
|
|
|
|
from enum import Enum
|
|
|
|
|
from typing import Union
|
|
|
|
|
|
|
|
|
|
class Level(Enum):
|
|
|
|
|
BASIC = 1
|
|
|
|
|
ADVANCED = 2
|
|
|
|
|
PRO = 3
|
|
|
|
|
|
|
|
|
|
class User:
|
|
|
|
|
name: str
|
|
|
|
|
level: Level
|
|
|
|
|
|
|
|
|
|
class Admin:
|
|
|
|
|
name: str
|
|
|
|
|
|
|
|
|
|
account: Union[User, Admin]
|
|
|
|
|
|
|
|
|
|
match account:
|
|
|
|
|
case Admin(name=name) | User(name=name, level=Level.PRO):
|
|
|
|
|
...
|
|
|
|
|
case User(level=Level.ADVANCED):
|
|
|
|
|
...
|
|
|
|
|
# Type-checking error: basic user unhandled
|
|
|
|
|
|
|
|
|
|
Obviously, no ``Matchable`` protocol (in terms of PEP 544) is needed, since
|
|
|
|
|
every class is matchable and therefore is subject to the checks specified
|
|
|
|
|
above.
|
|
|
|
|
|
|
|
|
|
|
2020-06-24 21:14:29 -04:00
|
|
|
|
Sealed classes as algebraic data types
|
|
|
|
|
--------------------------------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Quite often it is desirable to apply exhaustiveness to a set of classes without
|
|
|
|
|
defining ad-hoc union types, which is itself fragile if a class is missing in
|
|
|
|
|
the union definition. A design pattern where a group of record-like classes is
|
|
|
|
|
combined into a union is popular in other languages that support pattern
|
2020-06-24 21:14:29 -04:00
|
|
|
|
matching and is known under a name of algebraic data types [2]_.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
We propose to add a special decorator class ``@sealed`` to the ``typing``
|
|
|
|
|
module [6]_, that will have no effect at runtime, but will indicate to static
|
|
|
|
|
type checkers that all subclasses (direct and indirect) of this class should
|
|
|
|
|
be defined in the same module as the base class.
|
|
|
|
|
|
|
|
|
|
The idea is that since all subclasses are known, the type checker can treat
|
|
|
|
|
the sealed base class as a union of all its subclasses. Together with
|
2020-06-24 21:14:29 -04:00
|
|
|
|
dataclasses this allows a clean and safe support of algebraic data types
|
|
|
|
|
in Python. Consider this example::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
from dataclasses import dataclass
|
|
|
|
|
from typing import sealed
|
|
|
|
|
|
|
|
|
|
@sealed
|
|
|
|
|
class Node:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
class Expression(Node):
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
class Statement(Node):
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
@dataclass
|
|
|
|
|
class Name(Expression):
|
|
|
|
|
name: str
|
|
|
|
|
|
|
|
|
|
@dataclass
|
|
|
|
|
class Operation(Expression):
|
|
|
|
|
left: Expression
|
|
|
|
|
op: str
|
|
|
|
|
right: Expression
|
|
|
|
|
|
|
|
|
|
@dataclass
|
|
|
|
|
class Assignment(Statement):
|
|
|
|
|
target: str
|
|
|
|
|
value: Expression
|
|
|
|
|
|
2020-06-23 18:20:24 -04:00
|
|
|
|
@dataclass
|
2020-06-23 11:27:36 -04:00
|
|
|
|
class Print(Statement):
|
|
|
|
|
value: Expression
|
|
|
|
|
|
|
|
|
|
With such definition, a type checker can safely treat ``Node`` as
|
|
|
|
|
``Union[Name, Operation, Assignment, Print]``, and also safely treat e.g.
|
|
|
|
|
``Expression`` as ``Union[Name, Operation]``. So this will result in a type
|
|
|
|
|
checking error in the below snippet, because ``Name`` is not handled (and type
|
|
|
|
|
checker can give a useful error message)::
|
|
|
|
|
|
|
|
|
|
def dump(node: Node) -> str:
|
|
|
|
|
match node:
|
|
|
|
|
case Assignment(target, value):
|
|
|
|
|
return f"{target} = {dump(value)}"
|
|
|
|
|
case Print(value):
|
|
|
|
|
return f"print({dump(value)})"
|
|
|
|
|
case Operation(left, op, right):
|
|
|
|
|
return f"({dump(left)} {op} {dump(right)})"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Type erasure
|
|
|
|
|
------------
|
|
|
|
|
|
|
|
|
|
Class patterns are subject to runtime type erasure. Namely, although one
|
|
|
|
|
can define a type alias ``IntQueue = Queue[int]`` so that a pattern like
|
2020-06-23 18:20:24 -04:00
|
|
|
|
``IntQueue()`` is syntactically valid, type checkers should reject such a
|
2020-06-23 11:27:36 -04:00
|
|
|
|
match::
|
|
|
|
|
|
|
|
|
|
queue: Union[Queue[int], Queue[str]]
|
|
|
|
|
match queue:
|
|
|
|
|
case IntQueue(): # Type-checking error here
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
Note that the above snippet actually fails at runtime with the current
|
|
|
|
|
implementation of generic classes in the ``typing`` module, as well as
|
|
|
|
|
with builtin generic classes in the recently accepted PEP 585, because
|
|
|
|
|
they prohibit ``isinstance`` checks.
|
|
|
|
|
|
|
|
|
|
To clarify, generic classes are not prohibited in general from participating
|
|
|
|
|
in pattern matching, just that their type parameters can't be explicitly
|
|
|
|
|
specified. It is still fine if sub-patterns or literals bind the type
|
|
|
|
|
variables. For example::
|
|
|
|
|
|
|
|
|
|
from typing import Generic, TypeVar, Union
|
|
|
|
|
|
|
|
|
|
T = TypeVar('T')
|
|
|
|
|
|
|
|
|
|
class Result(Generic[T]):
|
|
|
|
|
first: T
|
|
|
|
|
other: list[T]
|
|
|
|
|
|
|
|
|
|
result: Union[Result[int], Result[str]]
|
|
|
|
|
|
|
|
|
|
match result:
|
|
|
|
|
case Result(first=int()):
|
|
|
|
|
... # Type of result is Result[int] here
|
|
|
|
|
case Result(other=["foo", "bar", *rest]):
|
|
|
|
|
... # Type of result is Result[str] here
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note about constants
|
|
|
|
|
--------------------
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
The fact that a capture pattern is always an assignment target may create unwanted
|
2020-06-23 11:27:36 -04:00
|
|
|
|
consequences when a user by mistake tries to "match" a value against
|
|
|
|
|
a constant instead of using the constant value pattern. As a result, at
|
|
|
|
|
runtime such match will always succeed and moreover override the value of
|
|
|
|
|
the constant. It is important therefore that static type checkers warn about
|
|
|
|
|
such situations. For example::
|
|
|
|
|
|
|
|
|
|
from typing import Final
|
|
|
|
|
|
|
|
|
|
MAX_INT: Final = 2 ** 64
|
|
|
|
|
|
|
|
|
|
value = 0
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case MAX_INT: # Type-checking error here: cannot assign to final name
|
|
|
|
|
print("Got big number")
|
|
|
|
|
case _:
|
|
|
|
|
print("Something else")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Precise type checking of star matches
|
|
|
|
|
-------------------------------------
|
|
|
|
|
|
|
|
|
|
Type checkers should perform precise type checking of star items in pattern
|
|
|
|
|
matching giving them either a heterogeneous ``list[T]`` type, or
|
|
|
|
|
a ``TypedDict`` type as specified by PEP 589. For example::
|
|
|
|
|
|
|
|
|
|
stuff: Tuple[int, str, str, float]
|
|
|
|
|
|
|
|
|
|
match stuff:
|
|
|
|
|
case a, *b, 0.5:
|
|
|
|
|
# Here a is int and b is list[str]
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Performance Considerations
|
|
|
|
|
==========================
|
|
|
|
|
|
|
|
|
|
Ideally, a ``match`` statement should have good runtime performance compared
|
|
|
|
|
to an equivalent chain of if-statements. Although the history of programming
|
|
|
|
|
languages is rife with examples of new features which increased engineer
|
|
|
|
|
productivity at the expense of additional CPU cycles, it would be
|
|
|
|
|
unfortunate if the benefits of ``match`` were counter-balanced by a significant
|
|
|
|
|
overall decrease in runtime performance.
|
|
|
|
|
|
|
|
|
|
Although this PEP does not specify any particular implementation strategy,
|
|
|
|
|
a few words about the prototype implementation and how it attempts to
|
|
|
|
|
maximize performance are in order.
|
|
|
|
|
|
|
|
|
|
Basically, the prototype implementation transforms all of the ``match``
|
|
|
|
|
statement syntax into equivalent if/else blocks - or more accurately, into
|
|
|
|
|
Python byte codes that have the same effect. In other words, all of the
|
|
|
|
|
logic for testing instance types, sequence lengths, mapping keys and
|
|
|
|
|
so on are inlined in place of the ``match``.
|
|
|
|
|
|
|
|
|
|
This is not the only possible strategy, nor is it necessarily the best.
|
2020-07-01 11:37:47 -04:00
|
|
|
|
For example, the instance checks could be memoized, especially
|
2020-06-23 11:27:36 -04:00
|
|
|
|
if there are multiple instances of the same class type but with different
|
|
|
|
|
arguments in a single match statement. It is also theoretically
|
2020-07-09 20:33:01 -04:00
|
|
|
|
possible for a future implementation to process case clauses or sub-patterns in
|
2020-06-23 11:27:36 -04:00
|
|
|
|
parallel using a decision tree rather than testing them one by one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Backwards Compatibility
|
|
|
|
|
=======================
|
|
|
|
|
|
|
|
|
|
This PEP is fully backwards compatible: the ``match`` and ``case``
|
|
|
|
|
keywords are proposed to be (and stay!) soft keywords, so their use as
|
|
|
|
|
variable, function, class, module or attribute names is not impeded at
|
|
|
|
|
all.
|
|
|
|
|
|
|
|
|
|
This is important because ``match`` is the name of a popular and
|
|
|
|
|
well-known function and method in the ``re`` module, which we have no
|
|
|
|
|
desire to break or deprecate.
|
|
|
|
|
|
|
|
|
|
The difference between hard and soft keywords is that hard keywords
|
|
|
|
|
are *always* reserved words, even in positions where they make no
|
|
|
|
|
sense (e.g. ``x = class + 1``), while soft keywords only get a special
|
|
|
|
|
meaning in context. Since our parser backtracks, that means that on
|
|
|
|
|
different attempts to parse a code fragment it could interpret a soft
|
|
|
|
|
keyword differently.
|
|
|
|
|
|
|
|
|
|
For example, suppose the parser encounters the following input::
|
|
|
|
|
|
|
|
|
|
match [x, y]:
|
|
|
|
|
|
|
|
|
|
The parser first attempts to parse this as an expression statement.
|
|
|
|
|
It interprets ``match`` as a NAME token, and then considers ``[x,
|
|
|
|
|
y]`` to be a double subscript. It then encounters the colon and has
|
|
|
|
|
to backtrack, since an expression statement cannot be followed by a
|
|
|
|
|
colon. The parser then backtracks to the start of the line and finds
|
|
|
|
|
that ``match`` is a soft keyword allowed in this position. It then
|
|
|
|
|
considers ``[x, y]`` to be a list expression. The colon then is just
|
|
|
|
|
what the parser expected, and the parse succeeds.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Impacts on third-party tools
|
|
|
|
|
============================
|
|
|
|
|
|
|
|
|
|
There are a lot of tools in the Python ecosystem that operate on Python
|
|
|
|
|
source code: linters, syntax highlighters, auto-formatters, and IDEs. These
|
|
|
|
|
will all need to be updated to include awareness of the ``match`` statement.
|
|
|
|
|
|
|
|
|
|
In general, these tools fall into one of two categories:
|
|
|
|
|
|
|
|
|
|
**Shallow** parsers don't try to understand the full syntax of Python, but
|
|
|
|
|
instead scan the source code for specific known patterns. IDEs, such as Visual
|
|
|
|
|
Studio Code, Emacs and TextMate, tend to fall in this category, since frequently
|
|
|
|
|
the source code is invalid while being edited, and a strict approach to parsing
|
|
|
|
|
would fail.
|
|
|
|
|
|
|
|
|
|
For these kinds of tools, adding knowledge of a new keyword is relatively
|
|
|
|
|
easy, just an addition to a table, or perhaps modification of a regular
|
|
|
|
|
expression.
|
|
|
|
|
|
|
|
|
|
**Deep** parsers understand the complete syntax of Python. An example of this
|
|
|
|
|
is the auto-formatter Black [9]_. A particular requirement with these kinds of
|
|
|
|
|
tools is that they not only need to understand the syntax of the current version
|
|
|
|
|
of Python, but older versions of Python as well.
|
|
|
|
|
|
|
|
|
|
The ``match`` statement uses a soft keyword, and it is one of the first major
|
|
|
|
|
Python features to take advantage of the capabilities of the new PEG parser. This
|
|
|
|
|
means that third-party parsers which are not 'PEG-compatible' will have a hard
|
|
|
|
|
time with the new syntax.
|
|
|
|
|
|
|
|
|
|
It has been noted that a number of these third-party tools leverage common parsing
|
|
|
|
|
libraries (Black for example uses a fork of the lib2to3 parser). It may be helpful
|
|
|
|
|
to identify widely-used parsing libraries (such as parso [10]_ and libCST [11]_)
|
|
|
|
|
and upgrade them to be PEG compatible.
|
|
|
|
|
|
|
|
|
|
However, since this work would need to be done not only for the match statement,
|
2020-06-23 15:38:03 -04:00
|
|
|
|
but for *any* new Python syntax that leverages the capabilities of the PEG parser,
|
2020-06-23 11:27:36 -04:00
|
|
|
|
it is considered out of scope for this PEP. (Although it is suggested that this
|
|
|
|
|
would make a fine Summer of Code project.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reference Implementation
|
|
|
|
|
========================
|
|
|
|
|
|
2020-07-07 13:32:01 -04:00
|
|
|
|
A `feature-complete CPython implementation
|
|
|
|
|
<https://github.com/brandtbucher/cpython/tree/patma>`_ is available on
|
|
|
|
|
GitHub.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-07 13:32:01 -04:00
|
|
|
|
An `interactive playground
|
|
|
|
|
<https://mybinder.org/v2/gh/gvanrossum/patma/master?urlpath=lab/tree/playground-622.ipynb>`_
|
|
|
|
|
based on the above implementation was created using Binder [12]_ and Jupyter [13]_.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Example Code
|
|
|
|
|
============
|
|
|
|
|
|
2020-07-07 13:32:01 -04:00
|
|
|
|
A small `collection of example code
|
|
|
|
|
<https://github.com/gvanrossum/patma/tree/master/examples>`_ is
|
|
|
|
|
available on GitHub.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _rejected ideas:
|
|
|
|
|
|
|
|
|
|
Rejected Ideas
|
|
|
|
|
==============
|
|
|
|
|
|
2020-06-23 18:20:24 -04:00
|
|
|
|
This general idea has been floating around for a pretty long time, and many
|
2020-06-23 11:27:36 -04:00
|
|
|
|
back and forth decisions were made. Here we summarize many alternative
|
2020-06-23 18:20:24 -04:00
|
|
|
|
paths that were taken but eventually abandoned.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Don't do this, pattern matching is hard to learn
|
|
|
|
|
------------------------------------------------
|
|
|
|
|
|
|
|
|
|
In our opinion, the proposed pattern matching is not more difficult than
|
|
|
|
|
adding ``isinstance()`` and ``getattr()`` to iterable unpacking. Also, we
|
|
|
|
|
believe the proposed syntax significantly improves readability for a wide
|
|
|
|
|
range of code patterns, by allowing to express *what* one wants to do, rather
|
|
|
|
|
than *how* to do it. We hope the few real code snippets we included in the PEP
|
|
|
|
|
above illustrate this comparison well enough. For more real code examples
|
|
|
|
|
and their translations see Ref. [7]_.
|
|
|
|
|
|
|
|
|
|
|
2020-07-07 22:35:05 -04:00
|
|
|
|
Don't do this, use existing method dispatching mechanisms
|
|
|
|
|
---------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
We recognize that some of the use cases for the ``match`` statement overlap
|
|
|
|
|
with what can be done with traditional object-oriented programming (OOP) design
|
|
|
|
|
techniques using class inheritance. The ability to choose alternate
|
|
|
|
|
behaviors based on testing the runtime type of a match subject might
|
|
|
|
|
even seem heretical to strict OOP purists.
|
|
|
|
|
|
|
|
|
|
However, Python has always been a language that embraces a variety of
|
|
|
|
|
programming styles and paradigms. Classic Python design idioms such as
|
|
|
|
|
"duck"-typing go beyond the traditional OOP model.
|
|
|
|
|
|
|
|
|
|
We believe that there are important use cases where the use of ``match`` results
|
|
|
|
|
in a cleaner and more maintainable architecture. These use cases tend to
|
|
|
|
|
be characterized by a number of features:
|
|
|
|
|
|
|
|
|
|
* Algorithms which cut across traditional lines of data encapsulation. If an
|
|
|
|
|
algorithm is processing heterogenous elements of different types (such as
|
|
|
|
|
evaluating or transforming an abstract syntax tree, or doing algebraic
|
|
|
|
|
manipulation of mathematical symbols), forcing the user to implement
|
|
|
|
|
the algorithm as individual methods on each element type results in
|
|
|
|
|
logic that is smeared across the entire codebase instead of being neatly
|
|
|
|
|
localized in once place.
|
|
|
|
|
* Program architectures where the set of possible data types is relatively
|
|
|
|
|
stable, but there is an ever-expanding set of operations to be performed
|
|
|
|
|
on those data types. Doing this in a strict OOP fashion requires constantly
|
|
|
|
|
adding new methods to both the base class and subclasses to support the new
|
|
|
|
|
methods, "polluting" the base class with lots of very specialized method
|
|
|
|
|
definitions, and causing widespread disruption and churn in the code. By
|
|
|
|
|
contrast, in a ``match``-based dispatch, adding a new behavior merely
|
|
|
|
|
involves writing a new ``match`` statement.
|
|
|
|
|
* OOP also does not handle dispatching based on the *shape* of an object, such
|
|
|
|
|
as the length of a tuple, or the presence of an attribute -- instead any such
|
|
|
|
|
dispatching decision must be encoded into the object's type. Shape-based
|
|
|
|
|
dispatching is particularly interesting when it comes to handling "duck"-typed
|
|
|
|
|
objects.
|
|
|
|
|
|
|
|
|
|
Where OOP is clearly superior is in the opposite case: where the set of possible
|
|
|
|
|
operations is relatively stable and well-defined, but there is an ever-growing
|
|
|
|
|
set of data types to operate on. A classic example of this is UI widget toolkits,
|
|
|
|
|
where there is a fixed set of interaction types (repaint, mouse click, keypress,
|
|
|
|
|
and so on), but the set of widget types is constantly expanding as developers
|
|
|
|
|
invent new and creative user interaction styles. Adding a new kind of widget
|
|
|
|
|
is a simple matter of writing a new subclass, whereas with a match-based approach
|
|
|
|
|
you end up having to add a new case clause to many widespread match statements.
|
|
|
|
|
We therefore don't recommend using ``match`` in such a situation.
|
|
|
|
|
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
Allow more flexible assignment targets instead
|
|
|
|
|
----------------------------------------------
|
|
|
|
|
|
|
|
|
|
There was an idea to instead just generalize the iterable unpacking to much
|
|
|
|
|
more general assignment targets, instead of adding a new kind of statement.
|
|
|
|
|
This concept is known in some other languages as "irrefutable matches". We
|
|
|
|
|
decided not to do this because inspection of real-life potential use cases
|
|
|
|
|
showed that in vast majority of cases destructuring is related to an ``if``
|
|
|
|
|
condition. Also many of those are grouped in a series of exclusive choices.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Make it an expression
|
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
In most other languages pattern matching is represented by an expression, not
|
|
|
|
|
statement. But making it an expression would be inconsistent with other
|
|
|
|
|
syntactic choices in Python. All decision making logic is expressed almost
|
|
|
|
|
exclusively in statements, so we decided to not deviate from this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use a hard keyword
|
|
|
|
|
------------------
|
|
|
|
|
|
|
|
|
|
There were options to make ``match`` a hard keyword, or choose a different
|
|
|
|
|
keyword. Although using a hard keyword would simplify life for simple-minded
|
|
|
|
|
syntax highlighters, we decided not to use hard keyword for several reasons:
|
|
|
|
|
|
|
|
|
|
* Most importantly, the new parser doesn't require us to do this. Unlike with
|
|
|
|
|
``async`` that caused hardships with being a soft keyword for few releases,
|
|
|
|
|
here we can make ``match`` a permanent soft keyword.
|
|
|
|
|
|
|
|
|
|
* ``match`` is so commonly used in existing code, that it would break almost
|
|
|
|
|
every existing program and will put a burden to fix code on many people who
|
|
|
|
|
may not even benefit from the new syntax.
|
|
|
|
|
|
|
|
|
|
* It is hard to find an alternative keyword that would not be commonly used
|
|
|
|
|
in existing programs as an identifier, and would still clearly reflect the
|
|
|
|
|
meaning of the statement.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use ``as`` or ``|`` instead of ``case`` for case clauses
|
|
|
|
|
--------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
The pattern matching proposed here is a combination of multi-branch control
|
|
|
|
|
flow (in line with ``switch`` in Algol-derived languages or ``cond`` in Lisp)
|
|
|
|
|
and object-deconstruction as found in functional languages. While the proposed
|
|
|
|
|
keyword ``case`` highlights the multi-branch aspect, alternative keywords such
|
|
|
|
|
as ``as`` would equally be possible, highlighting the deconstruction aspect.
|
|
|
|
|
``as`` or ``with``, for instance, also have the advantage of already being
|
|
|
|
|
keywords in Python. However, since ``case`` as a keyword can only occur as a
|
|
|
|
|
leading keyword inside a ``match`` statement, it is easy for a parser to
|
|
|
|
|
distinguish between its use as a keyword or as a variable.
|
|
|
|
|
|
|
|
|
|
Other variants would use a symbol like ``|`` or ``=>``, or go entirely without
|
|
|
|
|
special marker.
|
|
|
|
|
|
|
|
|
|
Since Python is a statement-oriented language in the tradition of Algol, and as
|
|
|
|
|
each composite statement starts with an identifying keyword, ``case`` seemed to
|
|
|
|
|
be most in line with Python's style and traditions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use a flat indentation scheme
|
|
|
|
|
-----------------------------
|
|
|
|
|
|
|
|
|
|
There was an idea to use an alternative indentation scheme, for example where
|
|
|
|
|
every case clause would not be indented with respect to the initial ``match``
|
|
|
|
|
part::
|
|
|
|
|
|
|
|
|
|
match expression:
|
|
|
|
|
case pattern_1:
|
|
|
|
|
...
|
|
|
|
|
case pattern_2:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
The motivation is that although flat indentation saves some horizontal space,
|
|
|
|
|
it may look awkward to an eye of a Python programmer, because everywhere else
|
|
|
|
|
colon is followed by an indent. This will also complicate life for
|
|
|
|
|
simple-minded code editors. Finally, the horizontal space issue can be
|
|
|
|
|
alleviated by allowing "half-indent" (i.e. two spaces instead of four) for
|
|
|
|
|
match statements.
|
|
|
|
|
|
|
|
|
|
In sample programs using `match`, written as part of the development of this
|
2020-06-23 15:21:43 -04:00
|
|
|
|
PEP, a noticeable improvement in code brevity is observed, more than making up
|
2020-06-23 11:27:36 -04:00
|
|
|
|
for the additional indentation level.
|
|
|
|
|
|
2020-06-25 00:26:17 -04:00
|
|
|
|
Another proposal considered was to use flat indentation but put the
|
|
|
|
|
expression on the line after ``match:``, like this::
|
|
|
|
|
|
|
|
|
|
match:
|
|
|
|
|
expression
|
|
|
|
|
case pattern_1:
|
|
|
|
|
...
|
|
|
|
|
case pattern_2:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
This was ultimately rejected because the first block would be a
|
|
|
|
|
novelty in Python's grammar: a block whose only content is a single
|
|
|
|
|
expression rather than a sequence of statements.
|
2020-06-23 19:28:16 -04:00
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Alternatives for constant value pattern
|
|
|
|
|
---------------------------------------
|
|
|
|
|
|
|
|
|
|
This is probably the trickiest item. Matching against some pre-defined
|
|
|
|
|
constants is very common, but the dynamic nature of Python also makes it
|
2020-07-06 19:37:45 -04:00
|
|
|
|
ambiguous with capture patterns. Five other alternatives were considered:
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
* Use some implicit rules. For example if a name was defined in the global
|
2020-07-06 19:37:45 -04:00
|
|
|
|
scope, then it refers to a constant, rather than representing a
|
|
|
|
|
capture pattern::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
# Here, the name "spam" must be defined in the global scope (and
|
|
|
|
|
# not shadowed locally). "side" must be local.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
match entree[-1]:
|
|
|
|
|
case spam: ... # Compares entree[-1] == spam.
|
|
|
|
|
case side: ... # Assigns side = entree[-1].
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
This however can cause surprises and action at a distance if someone
|
|
|
|
|
defines an unrelated coinciding name before the match statement.
|
|
|
|
|
|
|
|
|
|
* Use a rule based on the case of a name. In particular, if the name
|
2020-06-29 15:11:20 -04:00
|
|
|
|
starts with a lowercase letter it would be a capture pattern, while if
|
2020-06-23 11:27:36 -04:00
|
|
|
|
it starts with uppercase it would refer to a constant::
|
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
match entree[-1]:
|
|
|
|
|
case SPAM: ... # Compares entree[-1] == SPAM.
|
|
|
|
|
case side: ... # Assigns side = entree[-1].
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
This works well with the recommendations for naming constants from
|
|
|
|
|
PEP 8. The main objection is that there's no other part of core
|
2020-07-06 19:37:45 -04:00
|
|
|
|
Python where the case of a name is semantically significant.
|
2020-07-09 20:41:41 -04:00
|
|
|
|
In addition, Python allows identifiers to use different scripts,
|
|
|
|
|
many of which (e.g. CJK) don't have a case distinction.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
* Use extra parentheses to indicate lookup semantics for a given name. For
|
|
|
|
|
example::
|
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
match entree[-1]:
|
|
|
|
|
case (spam): ... # Compares entree[-1] == spam.
|
|
|
|
|
case side: ... # Assigns side = entree[-1].
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
This may be a viable option, but it can create some visual noise if used
|
|
|
|
|
often. Also honestly it looks pretty unusual, especially in nested contexts.
|
|
|
|
|
|
|
|
|
|
This also has the problem that we may want or need parentheses to
|
|
|
|
|
disambiguate grouping in patterns, e.g. in ``Point(x, y=(y :=
|
|
|
|
|
complex()))``.
|
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
* Introduce a special symbol, for example ``.``, ``$``, or ``^`` to
|
|
|
|
|
indicate that a given name is a constant to be matched against, not
|
|
|
|
|
to be assigned to. An earlier version of this proposal used a
|
|
|
|
|
leading-dot rule::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
match entree[-1]:
|
|
|
|
|
case .spam: ... # Compares entree[-1] == spam.
|
|
|
|
|
case side: ... # Assigns side = entree[-1].
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
While potentially useful, it introduces strange-looking new syntax
|
|
|
|
|
without making the pattern syntax any more expressive. Indeed,
|
|
|
|
|
named constants can be made to work with the existing rules by
|
|
|
|
|
converting them to ``Enum`` types, or enclosing them in their own
|
|
|
|
|
namespace (considered by the authors to be one honking great idea)::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
match entree[-1]:
|
|
|
|
|
case Sides.SPAM: ... # Compares entree[-1] == Sides.SPAM.
|
|
|
|
|
case side: ... # Assigns side = entree[-1].
|
|
|
|
|
|
|
|
|
|
If needed, the leading-dot rule (or a similar variant) could be
|
|
|
|
|
added back later with no backward-compatibility issues.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
* There was also on idea to make lookup semantics the default, and require
|
2020-07-06 19:37:45 -04:00
|
|
|
|
``$`` or ``?`` to be used in capture patterns::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
match entree[-1]:
|
|
|
|
|
case spam: ... # Compares entree[-1] == spam.
|
|
|
|
|
case side?: ... # Assigns side = entree[-1].
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
There are a few issues with this:
|
|
|
|
|
|
|
|
|
|
* Capture patterns are more common in typical code, so it is
|
|
|
|
|
undesirable to require special syntax for them.
|
|
|
|
|
|
|
|
|
|
* The authors are not aware of any other language that adorns
|
|
|
|
|
captures in this way.
|
|
|
|
|
|
|
|
|
|
* None of the proposed syntaxes have any precedent in Python.
|
|
|
|
|
|
|
|
|
|
* It would break the syntactic parallels of the current grammar::
|
|
|
|
|
|
|
|
|
|
match coords:
|
|
|
|
|
case ($x, $y):
|
|
|
|
|
return Point(x, y) # Why not "Point($x, $y)"?
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In the end, these alternatives were rejected because of the mentioned drawbacks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Disallow float literals in patterns
|
|
|
|
|
-----------------------------------
|
|
|
|
|
|
|
|
|
|
Because of the inexactness of floats, an early version of this proposal
|
|
|
|
|
did not allow floating-point constants to be used as match patterns. Part
|
|
|
|
|
of the justification for this prohibition is that Rust does this.
|
|
|
|
|
|
|
|
|
|
However, during implementation, it was discovered that distinguishing between
|
|
|
|
|
float values and other types required extra code in the VM that would slow
|
|
|
|
|
matches generally. Given that Python and Rust are very different languages
|
|
|
|
|
with different user bases and underlying philosophies, it was felt that
|
|
|
|
|
allowing float literals would not cause too much harm, and would be less
|
|
|
|
|
surprising to users.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Range matching patterns
|
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
|
|
This would allow patterns such as `1...6`. However, there are a host of
|
|
|
|
|
ambiguities:
|
|
|
|
|
|
|
|
|
|
* Is the range open, half-open, or closed? (I.e. is `6` included in the
|
|
|
|
|
above example or not?)
|
|
|
|
|
* Does the range match a single number, or a range object?
|
|
|
|
|
* Range matching is often used for character ranges ('a'...'z') but that
|
|
|
|
|
won't work in Python since there's no character data type, just strings.
|
|
|
|
|
* Range matching can be a significant performance optimization if you can
|
|
|
|
|
pre-build a jump table, but that's not generally possible in Python due
|
|
|
|
|
to the fact that names can be dynamically rebound.
|
|
|
|
|
|
|
|
|
|
Rather than creating a special-case syntax for ranges, it was decided
|
|
|
|
|
that allowing custom pattern objects (`InRange(0, 6)`) would be more flexible
|
|
|
|
|
and less ambiguous; however those ideas have been postponed for the time
|
|
|
|
|
being (See `deferred ideas`_).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use dispatch dict semantics for matches
|
|
|
|
|
---------------------------------------
|
|
|
|
|
|
|
|
|
|
Implementations for classic ``switch`` statement sometimes use a pre-computed
|
|
|
|
|
hash table instead of a chained equality comparisons to gain some performance.
|
|
|
|
|
In the context of ``match`` statement this is technically also possible for
|
|
|
|
|
matches against literal patterns. However, having subtly different semantics
|
|
|
|
|
for different kinds of patterns would be too surprising for potentially
|
|
|
|
|
modest performance win.
|
|
|
|
|
|
|
|
|
|
We can still experiment with possible performance optimizations in this
|
|
|
|
|
direction if they will not cause semantic differences.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use ``continue`` and ``break`` in case clauses.
|
|
|
|
|
-----------------------------------------------
|
|
|
|
|
|
|
|
|
|
Another rejected proposal was to define new meanings for ``continue``
|
|
|
|
|
and ``break`` inside of ``match``, which would have the following behavior:
|
|
|
|
|
|
|
|
|
|
* ``continue`` would exit the current case clause and continue matching
|
|
|
|
|
at the next case clause.
|
|
|
|
|
* ``break`` would exit the match statement.
|
|
|
|
|
|
|
|
|
|
However, there is a serious drawback to this proposal: if the ``match`` statement
|
|
|
|
|
is nested inside of a loop, the meanings of ``continue`` and ``break`` are now
|
|
|
|
|
changed. This may cause unexpected behavior during refactorings; also, an
|
|
|
|
|
argument can be made that there are other means to get the same behavior (such
|
|
|
|
|
as using guard conditions), and that in practice it's likely that the existing
|
|
|
|
|
behavior of ``continue`` and ``break`` are far more useful.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
AND (``&``) patterns
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
This proposal defines an OR-pattern (``|``) to match one of several alternates;
|
|
|
|
|
why not also an AND-pattern (``&``)? Especially given that some other languages
|
|
|
|
|
(F# for example) support this.
|
|
|
|
|
|
|
|
|
|
However, it's not clear how useful this would be. The semantics for matching
|
|
|
|
|
dictionaries, objects and sequences already incorporates an implicit 'and': all
|
|
|
|
|
attributes and elements mentioned must be present for the match to succeed. Guard
|
|
|
|
|
conditions can also support many of the use cases that a hypothetical 'and'
|
|
|
|
|
operator would be used for.
|
|
|
|
|
|
|
|
|
|
In the end, it was decided that this would make the syntax more complex without
|
|
|
|
|
adding a significant benefit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Negative match patterns
|
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
|
|
A negation of a match pattern using the operator ``!`` as a prefix would match
|
|
|
|
|
exactly if the pattern itself does not match. For instance, ``!(3 | 4)``
|
|
|
|
|
would match anything except ``3`` or ``4``.
|
|
|
|
|
|
|
|
|
|
This was rejected because there is documented evidence [8]_ that this feature
|
|
|
|
|
is rarely useful (in languages which support it) or used as double negation
|
|
|
|
|
``!!`` to control variable scopes and prevent variable bindings (which does
|
|
|
|
|
not apply to Python). It can also be simulated using guard conditions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Check exhaustiveness at runtime
|
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
|
|
The question is what to do if no case clause has a matching pattern, and
|
|
|
|
|
there is no default case. An earlier version of the proposal specified that
|
|
|
|
|
the behavior in this case would be to throw an exception rather than
|
|
|
|
|
silently falling through.
|
|
|
|
|
|
|
|
|
|
The arguments back and forth were many, but in the end the EIBTI (Explicit
|
|
|
|
|
Is Better Than Implicit) argument won out: it's better to have the programmer
|
|
|
|
|
explicitly throw an exception if that is the behavior they want.
|
|
|
|
|
|
|
|
|
|
For cases such as sealed classes and enums, where the patterns are all known
|
|
|
|
|
to be members of a discrete set, `static checkers`_ can warn about missing
|
|
|
|
|
patterns.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Type annotations for pattern variables
|
|
|
|
|
--------------------------------------
|
|
|
|
|
|
|
|
|
|
The proposal was to combine patterns with type annotations::
|
|
|
|
|
|
|
|
|
|
match x:
|
|
|
|
|
case [a: int, b: str]: print(f"An int {a} and a string {b}:)
|
|
|
|
|
case [a: int, b: int, c: int]: print(f"Three ints", a, b, c)
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
This idea has a lot of problems. For one, the colon can only
|
|
|
|
|
be used inside of brackets or parens, otherwise the syntax becomes
|
|
|
|
|
ambiguous. And because Python disallows ``isinstance()`` checks
|
|
|
|
|
on generic types, type annotations containing generics will not
|
|
|
|
|
work as expected.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Allow ``*rest`` in class patterns
|
|
|
|
|
---------------------------------
|
|
|
|
|
|
|
|
|
|
It was proposed to allow ``*rest`` in a class pattern, giving a
|
|
|
|
|
variable to be bound to all positional arguments at once (similar to
|
|
|
|
|
its use in unpacking assignments). It would provide some symmetry
|
|
|
|
|
with sequence patterns. But it might be confused with a feature to
|
|
|
|
|
provide the *values* for all positional arguments at once. And there
|
|
|
|
|
seems to be no practical need for it, so it was scrapped. (It could
|
|
|
|
|
easily be added at a later stage if a need arises.)
|
|
|
|
|
|
2020-07-06 19:37:45 -04:00
|
|
|
|
Disallow ``_.a`` in constant value patterns
|
2020-06-24 10:46:54 -04:00
|
|
|
|
------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
The first public draft said that the initial name in a constant value
|
|
|
|
|
pattern must not be ``_`` because ``_`` has a special meaning in
|
2020-07-06 19:37:45 -04:00
|
|
|
|
pattern matching, so this would be invalid::
|
2020-06-24 10:46:54 -04:00
|
|
|
|
|
|
|
|
|
case _.a: ...
|
|
|
|
|
|
|
|
|
|
(However, ``a._`` would be legal and load the attribute with name
|
|
|
|
|
``_`` of the object ``a`` as usual.)
|
|
|
|
|
|
|
|
|
|
There was some pushback against this on python-dev (some people have a
|
|
|
|
|
legitimate use for ``_`` as an important global variable, esp. in
|
|
|
|
|
i18n) and the only reason for this prohibition was to prevent some
|
|
|
|
|
user confusion. But it's not the hill to die on.
|
|
|
|
|
|
2020-06-24 20:28:35 -04:00
|
|
|
|
Use some other token as wildcard
|
|
|
|
|
--------------------------------
|
|
|
|
|
|
|
|
|
|
It has been proposed to use ``...`` (i.e., the ellipsis token) or
|
|
|
|
|
``*`` (star) as a wildcard. However, both these look as if an
|
|
|
|
|
arbitrary number of items is omitted::
|
|
|
|
|
|
|
|
|
|
case [a, ..., z]: ...
|
|
|
|
|
case [a, *, z]: ...
|
|
|
|
|
|
|
|
|
|
Both look like the would match a sequence of at two or more items,
|
|
|
|
|
capturing the first and last values.
|
|
|
|
|
|
|
|
|
|
In addition, if ``*`` were to be used as the wildcard character, we
|
|
|
|
|
would have to come up with some other way to capture the rest of a
|
|
|
|
|
sequence, currently spelled like this::
|
|
|
|
|
|
|
|
|
|
case [first, second, *rest]: ...
|
|
|
|
|
|
|
|
|
|
Using an ellipsis would also be more confusing in documentation and
|
|
|
|
|
examples, where ``...`` is routinely used to indicate something
|
|
|
|
|
obvious or irrelevant. (Yes, this would also be an argument against
|
|
|
|
|
the other uses of ``...`` in Python, but that water is already under
|
|
|
|
|
the bridge.)
|
|
|
|
|
|
|
|
|
|
Another proposal was to use ``?``. This could be acceptable, although
|
2020-07-02 01:32:42 -04:00
|
|
|
|
it would require modifying the tokenizer.
|
|
|
|
|
|
|
|
|
|
Also, ``_`` is already used
|
2020-06-24 20:28:35 -04:00
|
|
|
|
as a throwaway target in other contexts, and this use is pretty
|
|
|
|
|
similar. This example is from ``difflib.py`` in the stdlib::
|
|
|
|
|
|
|
|
|
|
for tag, _, _, j1, j2 in group: ...
|
|
|
|
|
|
2020-07-02 01:32:42 -04:00
|
|
|
|
Perhaps the most convincing argument is that ``_`` is used as the
|
|
|
|
|
wildcard in every other language we've looked at supporting pattern
|
|
|
|
|
matching: C#, Elixir, Erlang, F#, Haskell, Mathematica, OCaml, Ruby,
|
|
|
|
|
Rust, Scala, and Swift. Now, in general, we should not be concerned
|
|
|
|
|
too much with what another language does, since Python is clearly
|
|
|
|
|
different from all these languages. However, if there is such an
|
|
|
|
|
overwhelming and strong consensus, Python should not go out of its way
|
|
|
|
|
to do something completely different -- particularly given that ``_``
|
|
|
|
|
works well in Python and is already in use as a throwaway target.
|
|
|
|
|
|
|
|
|
|
Note that ``_`` is not assigned to by patterns -- this avoids
|
|
|
|
|
conflicts with the use of ``_`` as a marker for translatable strings
|
|
|
|
|
and an alias for ``gettext.gettext``, as recommended by the
|
|
|
|
|
``gettext`` module documentation.
|
|
|
|
|
|
2020-07-02 01:11:53 -04:00
|
|
|
|
Use some other syntax instead of ``|`` for OR patterns
|
|
|
|
|
------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
A few alternatives to using ``|`` to separate the alternatives in OR
|
|
|
|
|
patterns have been proposed. Instead of::
|
|
|
|
|
|
|
|
|
|
case 401|403|404:
|
|
|
|
|
print("Some HTTP error")
|
|
|
|
|
|
|
|
|
|
the following proposals have been fielded:
|
|
|
|
|
|
|
|
|
|
- Use a comma::
|
|
|
|
|
|
|
|
|
|
case 401, 403, 404:
|
|
|
|
|
print("Some HTTP error")
|
|
|
|
|
|
|
|
|
|
This looks too much like a tuple -- we would have to find a
|
|
|
|
|
different way to spell tuples, and the construct would have to be
|
|
|
|
|
parenthesized inside the argument list of a class pattern. In
|
|
|
|
|
general, commas already have many different meanings in Python, we
|
|
|
|
|
shouldn't add more.
|
|
|
|
|
|
|
|
|
|
- Allow stacked cases::
|
|
|
|
|
|
|
|
|
|
case 401:
|
|
|
|
|
case 403:
|
|
|
|
|
case 404:
|
|
|
|
|
print("Some HTTP error")
|
|
|
|
|
|
|
|
|
|
This is how this would be done in C, using its fall-through
|
|
|
|
|
semantics for cases. However, we don't want to mislead people into
|
|
|
|
|
thinking that ``match``/``case`` uses fall-through semantics (which
|
|
|
|
|
are a common source of bugs in C). Also, this would be a novel
|
|
|
|
|
indentation pattern, which might make it harder to support in IDEs
|
|
|
|
|
and such (it would break the simple rule "add an indentation level
|
|
|
|
|
after a line ending in a colon"). Finally, this wouldn't support
|
|
|
|
|
OR patterns nested inside other patterns.
|
|
|
|
|
|
|
|
|
|
- Use ``case in`` followed by a comma-separated list::
|
|
|
|
|
|
|
|
|
|
case in 401, 403, 404:
|
|
|
|
|
print("Some HTTP error")
|
|
|
|
|
|
|
|
|
|
This wouldn't work for OR patterns nested inside other patterns,
|
|
|
|
|
like::
|
|
|
|
|
|
|
|
|
|
case Point(0|1, 0|1):
|
|
|
|
|
print("A corner of the unit square")
|
|
|
|
|
|
|
|
|
|
- Use the ``or`` keyword::
|
|
|
|
|
|
|
|
|
|
case 401 or 403 or 404:
|
|
|
|
|
print("Some HTTP error")
|
|
|
|
|
|
|
|
|
|
This could work, and the readability is not too different from using
|
|
|
|
|
``|``. Some users expressed a preference for ``or`` because they
|
|
|
|
|
associate ``|`` with bitwise OR. However:
|
|
|
|
|
|
|
|
|
|
1. Many other languages that have pattern matching use ``|`` (the
|
|
|
|
|
list includes Elixir, Erlang, F#, Mathematica, OCaml, Ruby, Rust,
|
|
|
|
|
and Scala).
|
|
|
|
|
2. ``|`` is shorter, which may contribute to the readability of
|
|
|
|
|
nested patterns like ``Point(0|1, 0|1)``.
|
|
|
|
|
3. Some people mistakenly believe that ``|`` has the wrong priority;
|
|
|
|
|
but since patterns don't support other operators it has the same
|
|
|
|
|
priority as in expressions.
|
|
|
|
|
4. Python users use ``or`` very frequently, and may build an
|
|
|
|
|
impression that it is strongly associated with Boolean
|
|
|
|
|
short-circuiting.
|
|
|
|
|
5. ``|`` is used between alternatives in regular expressions
|
|
|
|
|
and in EBNF grammars (like Python's own).
|
|
|
|
|
6. ``|`` not just used for bitwise OR -- it's used for set unions,
|
|
|
|
|
dict merging (:pep:`584`) and is being considered as an
|
|
|
|
|
alternative to ``typing.Union`` (:pep:`604`).
|
|
|
|
|
7. ``|`` works better as a visual separator, especially between
|
|
|
|
|
strings. Compare::
|
|
|
|
|
|
|
|
|
|
case "spam" or "eggs" or "cheese":
|
|
|
|
|
|
|
|
|
|
to::
|
|
|
|
|
|
|
|
|
|
case "spam" | "eggs" | "cheese":
|
|
|
|
|
|
2020-07-02 01:46:42 -04:00
|
|
|
|
Add an ``else`` clause
|
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
|
|
We decided not to add an ``else`` clause for several reasons.
|
|
|
|
|
|
|
|
|
|
- It is redundant, since we already have ``case _:``
|
|
|
|
|
|
|
|
|
|
- There will forever be confusion about the indentation level of the
|
|
|
|
|
``else:`` -- should it align with the list of cases or with the
|
|
|
|
|
``match`` keyword?
|
|
|
|
|
|
|
|
|
|
- Completionist arguments like "every other statement has one" are
|
|
|
|
|
false -- only those statements have an ``else`` clause where it adds
|
|
|
|
|
new functionality.
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
.. _deferred ideas:
|
|
|
|
|
|
|
|
|
|
Deferred Ideas
|
|
|
|
|
==============
|
|
|
|
|
|
|
|
|
|
There were a number of proposals to extend the matching syntax that we
|
|
|
|
|
decided to postpone for possible future PEP. These fall into the realm of
|
|
|
|
|
"cool idea but not essential", and it was felt that it might be better to
|
|
|
|
|
acquire some real-world data on how the match statement will be used in
|
|
|
|
|
practice before moving forward with some of these proposals.
|
|
|
|
|
|
|
|
|
|
Note that in each case, the idea was judged to be a "two-way door",
|
|
|
|
|
meaning that there should be no backwards-compatibility issues with adding
|
|
|
|
|
these features later.
|
|
|
|
|
|
|
|
|
|
One-off syntax variant
|
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
|
|
While inspecting some code-bases that may benefit the most from the proposed
|
|
|
|
|
syntax, it was found that single clause matches would be used relatively often,
|
|
|
|
|
mostly for various special-casing. In other languages this is supported in
|
|
|
|
|
the form of one-off matches. We proposed to support such one-off matches too::
|
|
|
|
|
|
|
|
|
|
if match value as pattern [and guard]:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
or, alternatively, without the ``if``::
|
|
|
|
|
|
|
|
|
|
match value as pattern [if guard]:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
as equivalent to the following expansion::
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case pattern [if guard]:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
To illustrate how this will benefit readability, consider this (slightly
|
|
|
|
|
simplified) snippet from real code::
|
|
|
|
|
|
|
|
|
|
if isinstance(node, CallExpr):
|
|
|
|
|
if (isinstance(node.callee, NameExpr) and len(node.args) == 1 and
|
|
|
|
|
isinstance(node.args[0], NameExpr)):
|
|
|
|
|
call = node.callee.name
|
|
|
|
|
arg = node.args[0].name
|
|
|
|
|
... # Continue special-casing 'call' and 'arg'
|
|
|
|
|
... # Follow with common code
|
|
|
|
|
|
|
|
|
|
This can be rewritten in a more straightforward way as::
|
|
|
|
|
|
|
|
|
|
if match node as CallExpr(callee=NameExpr(name=call), args=[NameExpr(name=arg)]):
|
|
|
|
|
... # Continue special-casing 'call' and 'arg'
|
|
|
|
|
... # Follow with common code
|
|
|
|
|
|
|
|
|
|
This one-off form would not allow ``elif match`` statements, as it was only
|
|
|
|
|
meant to handle a single pattern case. It was intended to be special case
|
|
|
|
|
of a ``match`` statement, not a special case of an ``if`` statement::
|
|
|
|
|
|
|
|
|
|
if match value_1 as patter_1 [and guard_1]:
|
|
|
|
|
...
|
|
|
|
|
elif match value_2 as pattern_2 [and guard_2]: # Not allowed
|
|
|
|
|
...
|
|
|
|
|
elif match value_3 as pattern_3 [and guard_3]: # Not allowed
|
|
|
|
|
...
|
|
|
|
|
else: # Also not allowed
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
This would defeat the purpose of one-off matches as a complement to exhaustive
|
|
|
|
|
full matches - it's better and clearer to use a full match in this case.
|
|
|
|
|
|
|
|
|
|
Similarly, ``if not match`` would not be allowed, since ``match ... as ...`` is not
|
|
|
|
|
an expression. Nor do we propose a ``while match`` construct present in some languages
|
|
|
|
|
with pattern matching, since although it may be handy, it will likely be used
|
|
|
|
|
rarely.
|
|
|
|
|
|
2020-07-07 17:32:08 -04:00
|
|
|
|
Other pattern-based constructions
|
|
|
|
|
---------------------------------
|
|
|
|
|
|
|
|
|
|
Many other languages supporting pattern-matching use it as a basis for multiple
|
2020-07-07 22:35:05 -04:00
|
|
|
|
language constructs, including a matching operator, a generalized form
|
2020-07-07 17:32:08 -04:00
|
|
|
|
of assignment, a filter for loops, a method for synchronizing communication,
|
|
|
|
|
or specialized if statements. Some of these were mentioned in the discussion
|
|
|
|
|
of the first draft. Another question asked was why this particular form (joining
|
|
|
|
|
binding and conditional selection) was chosen while other forms were not.
|
|
|
|
|
|
|
|
|
|
Introducing more uses of patterns would be too bold and premature given the
|
|
|
|
|
experience we have using patterns, and would make this proposal too
|
|
|
|
|
complicated. The statement as presented provides a form of the feature that
|
|
|
|
|
is sufficiently general to be useful while being self-contained, and without
|
|
|
|
|
having a massive impact on the syntax and semantics of the language as a whole.
|
|
|
|
|
|
|
|
|
|
After some experience with this feature, the community may have a better
|
|
|
|
|
feeling for what other uses of pattern matching could be valuable in Python.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Algebraic matching of repeated names
|
|
|
|
|
------------------------------------
|
|
|
|
|
|
2020-06-27 15:05:22 -04:00
|
|
|
|
A technique occasionally seen in functional languages like Erlang and Elixir is
|
2020-06-23 11:27:36 -04:00
|
|
|
|
to use a match variable multiple times in the same pattern::
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case Point(x, x):
|
|
|
|
|
print("Point is on a diagonal!")
|
|
|
|
|
|
|
|
|
|
The idea here is that the first appearance of ``x`` would bind the value
|
|
|
|
|
to the name, and subsequent occurrences would verify that the incoming
|
|
|
|
|
value was equal to the value previously bound. If the value was not equal,
|
|
|
|
|
the match would fail.
|
|
|
|
|
|
|
|
|
|
However, there are a number of subtleties involved with mixing load-store
|
2020-06-29 15:11:20 -04:00
|
|
|
|
semantics for capture patterns. For the moment, we decided to make repeated
|
2020-06-23 11:27:36 -04:00
|
|
|
|
use of names within the same pattern an error; we can always relax this
|
|
|
|
|
restriction later without affecting backwards compatibility.
|
|
|
|
|
|
|
|
|
|
Note that you **can** use the same name more than once in alternate choices::
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case x | [x]:
|
|
|
|
|
# etc.
|
|
|
|
|
|
|
|
|
|
|
2020-06-28 23:30:08 -04:00
|
|
|
|
.. _extended matching:
|
|
|
|
|
|
2020-07-01 10:57:00 -04:00
|
|
|
|
Custom matching protocol
|
|
|
|
|
------------------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
During the initial design discussions for this PEP, there were a lot of ideas
|
2020-07-01 10:57:00 -04:00
|
|
|
|
thrown around about custom matchers. There were a couple of motivations for
|
|
|
|
|
this:
|
|
|
|
|
|
|
|
|
|
* Some classes might want to expose a different set of "matchable" names
|
|
|
|
|
than the actual class properties.
|
|
|
|
|
* Some classes might have properties that are expensive to calculate, and
|
|
|
|
|
therefore shouldn't be evaluated unless the match pattern actually needed
|
|
|
|
|
access to them.
|
|
|
|
|
* There were ideas for exotic matchers such as ``IsInstance()``,
|
|
|
|
|
``InRange()``, ``RegexMatchingGroup()`` and so on.
|
|
|
|
|
* In order for built-in types and standard library classes to be able
|
|
|
|
|
to support matching in a reasonable and intuitive way, it was believed
|
|
|
|
|
that these types would need to implement special matching logic.
|
|
|
|
|
|
|
|
|
|
These customized match behaviors would be controlled by a special
|
|
|
|
|
``__match__`` method on the class name. There were two competing variants:
|
|
|
|
|
|
|
|
|
|
* A 'full-featured' match protocol which would pass in not only
|
2020-07-06 00:29:52 -04:00
|
|
|
|
the subject to be matched, but detailed information about
|
2020-07-01 10:57:00 -04:00
|
|
|
|
which attributes the specified pattern was interested in.
|
2020-07-06 00:29:52 -04:00
|
|
|
|
* A simplified match protocol, which only passed in the subject value,
|
2020-07-01 10:57:00 -04:00
|
|
|
|
and which returned a "proxy object" (which in most cases could be
|
2020-07-06 00:29:52 -04:00
|
|
|
|
just the subject) containing the matchable attributes.
|
2020-07-01 10:57:00 -04:00
|
|
|
|
|
|
|
|
|
Here's an example of one version of the more complex protocol proposed::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
match expr:
|
|
|
|
|
case BinaryOp(left=Number(value=x), op=op, right=Number(value=y)):
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
from types import PatternObject
|
|
|
|
|
BinaryOp.__match__(
|
|
|
|
|
(),
|
|
|
|
|
{
|
|
|
|
|
"left": PatternObject(Number, (), {"value": ...}, -1, False),
|
|
|
|
|
"op": ...,
|
|
|
|
|
"right": PatternObject(Number, (), {"value": ...}, -1, False),
|
|
|
|
|
},
|
|
|
|
|
-1,
|
|
|
|
|
False,
|
|
|
|
|
)
|
|
|
|
|
|
2020-07-01 10:57:00 -04:00
|
|
|
|
One drawback of this protocol is that the arguments to ``__match__``
|
|
|
|
|
would be expensive to construct, and could not be pre-computed due to
|
|
|
|
|
the fact that, because of the way names are bound, there are no real
|
|
|
|
|
constants in Python. It also meant that the ``__match__`` method would
|
|
|
|
|
have to re-implement much of the logic of matching which would otherwise
|
|
|
|
|
be implemented in C code in the Python VM. As a result, this option would
|
|
|
|
|
perform poorly compared to an equilvalent ``if``-statement.
|
|
|
|
|
|
|
|
|
|
The simpler protocol suffered from the fact that although it was more
|
|
|
|
|
performant, it was much less flexible, and did not allow for many of
|
|
|
|
|
the creative custom matchers that people were dreaming up.
|
|
|
|
|
|
|
|
|
|
Late in the design process, however, it was realized that the need for
|
|
|
|
|
a custom matching protocol was much less than anticipated. Virtually
|
|
|
|
|
all the realistic (as opposed to fanciful) uses cases brought up could
|
|
|
|
|
be handled by the built-in matching behavior, although in a few cases
|
|
|
|
|
an extra guard condition was required to get the desired effect.
|
|
|
|
|
|
|
|
|
|
Moreover, it turned out that none of the standard library classes really
|
|
|
|
|
needed any special matching support other than an appropriate
|
|
|
|
|
``__match_args__`` property.
|
|
|
|
|
|
|
|
|
|
The decision to postpone this feature came with a realization that this is
|
|
|
|
|
not a one-way door; that a more flexible and customizable matching protocol
|
|
|
|
|
can be added later, especially as we gain more experience with real-world
|
|
|
|
|
use cases and actual user needs.
|
|
|
|
|
|
|
|
|
|
The authors of this PEP expect that the ``match`` statement will evolve
|
|
|
|
|
over time as usage patterns and idioms evolve, in a way similar to what
|
|
|
|
|
other "multi-stage" PEPs have done in the past. When this happens, the
|
|
|
|
|
extended matching issue can be revisited.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Parameterized Matching Syntax
|
|
|
|
|
-----------------------------
|
|
|
|
|
|
|
|
|
|
(Also known as "Class Instance Matchers".)
|
|
|
|
|
|
|
|
|
|
This is another variant of the "custom match classes" idea that would allow
|
|
|
|
|
diverse kinds of custom matchers mentioned in the previous section -- however,
|
|
|
|
|
instead of using an extended matching protocol, it would be achieved by
|
|
|
|
|
introducing an additional pattern type with its own syntax. This pattern type
|
|
|
|
|
would accept two distinct sets of parameters: one set which consists of the
|
|
|
|
|
actual parameters passed into the pattern object's constructor, and another
|
|
|
|
|
set representing the binding variables for the pattern.
|
|
|
|
|
|
|
|
|
|
The ``__match__`` method of these objects could use the constructor parameter
|
|
|
|
|
values in deciding what was a valid match.
|
|
|
|
|
|
|
|
|
|
This would allow patterns such as ``InRange<0, 6>(value)``, which would match
|
|
|
|
|
a number in the range 0..6 and assign the matched value to 'value'. Similarly,
|
2020-06-23 15:38:21 -04:00
|
|
|
|
one could have a pattern which tests for the existence of a named group in
|
2020-06-23 11:27:36 -04:00
|
|
|
|
a regular expression match result (different meaning of the word 'match').
|
|
|
|
|
|
|
|
|
|
Although there is some support for this idea, there was a lot of bikeshedding
|
|
|
|
|
on the syntax (there are not a lot of attractive options available)
|
|
|
|
|
and no clear consensus was reached, so it was decided that for now, this
|
|
|
|
|
feature is not essential to the PEP.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pattern Utility Library
|
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
|
|
Both of the previous ideas would be accompanied by a new Python standard
|
|
|
|
|
library module which would contain a rich set of exotic and useful matchers.
|
|
|
|
|
However, it it not really possible to implement such a library without
|
|
|
|
|
adopting one of the extended pattern proposals given in the previous sections,
|
|
|
|
|
so this idea is also deferred.
|
|
|
|
|
|
|
|
|
|
|
2020-07-07 13:32:01 -04:00
|
|
|
|
Acknowledgments
|
|
|
|
|
===============
|
|
|
|
|
|
|
|
|
|
We are grateful for the help of the following individuals (among many
|
|
|
|
|
others) for helping out during various phases of the writing of this
|
|
|
|
|
PEP:
|
|
|
|
|
|
2020-07-09 20:47:06 -04:00
|
|
|
|
- Gregory P. Smith
|
|
|
|
|
- Jim Jewett
|
|
|
|
|
- Mark Shannon
|
2020-07-07 13:32:01 -04:00
|
|
|
|
- Nate Lust
|
2020-07-09 20:47:06 -04:00
|
|
|
|
- Taine Zhao
|
2020-07-07 13:32:01 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Version History
|
|
|
|
|
===============
|
|
|
|
|
|
|
|
|
|
1. Initial version
|
|
|
|
|
|
|
|
|
|
2. Substantial rewrite, including:
|
|
|
|
|
|
|
|
|
|
- Minor clarifications, grammar and typo corrections
|
|
|
|
|
- Rename various concepts
|
|
|
|
|
- Additional discussion of rejected ideas, including:
|
|
|
|
|
|
|
|
|
|
- Why we choose ``_`` for wildcard patterns
|
|
|
|
|
- Why we choose ``|`` for OR patterns
|
|
|
|
|
- Why we choose not to use special syntax for capture variables
|
2020-07-07 19:00:55 -04:00
|
|
|
|
- Why this pattern matching operation and not others
|
2020-07-07 13:32:01 -04:00
|
|
|
|
|
2020-07-07 19:00:55 -04:00
|
|
|
|
- Clarify exception and side effect semantics
|
2020-07-07 13:32:01 -04:00
|
|
|
|
- Clarify partial binding semantics
|
|
|
|
|
- Drop restriction on use of ``_`` in load contexts
|
2020-07-08 16:02:37 -04:00
|
|
|
|
- Drop the default single positional argument being the whole
|
|
|
|
|
subject except for a handful of built-in types
|
2020-07-07 13:32:01 -04:00
|
|
|
|
- Simplify behavior of ``__match_args__``
|
|
|
|
|
- Drop the ``__match__`` protocol (moved to `deferred ideas`_)
|
|
|
|
|
- Drop ``ImpossibleMatchError`` exception
|
|
|
|
|
- Drop leading dot for loads (moved to `deferred ideas`_)
|
|
|
|
|
- Reworked the initial sections (everything before `syntax`_)
|
2020-07-07 22:35:05 -04:00
|
|
|
|
- Added an overview of all the types of patterns before the
|
2020-07-07 19:00:55 -04:00
|
|
|
|
detailed description
|
|
|
|
|
- Added simplified syntax next to the description of each pattern
|
|
|
|
|
- Separate description of the wildcard from capture patterns
|
|
|
|
|
- Added Daniel F Moisset as sixth co-author
|
2020-07-07 13:32:01 -04:00
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
References
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
.. [1]
|
|
|
|
|
https://en.wikipedia.org/wiki/Pattern_matching
|
|
|
|
|
|
|
|
|
|
.. [2]
|
|
|
|
|
https://en.wikipedia.org/wiki/Algebraic_data_type
|
|
|
|
|
|
|
|
|
|
.. [3]
|
|
|
|
|
https://doc.rust-lang.org/reference/patterns.html
|
|
|
|
|
|
|
|
|
|
.. [4]
|
|
|
|
|
https://docs.scala-lang.org/tour/pattern-matching.html
|
|
|
|
|
|
|
|
|
|
.. [5]
|
|
|
|
|
https://docs.python.org/3/library/dataclasses.html
|
|
|
|
|
|
|
|
|
|
.. [6]
|
|
|
|
|
https://docs.python.org/3/library/typing.html
|
|
|
|
|
|
|
|
|
|
.. [7]
|
|
|
|
|
https://github.com/gvanrossum/patma/blob/master/EXAMPLES.md
|
|
|
|
|
|
|
|
|
|
.. [8]
|
|
|
|
|
https://dl.acm.org/doi/abs/10.1145/2480360.2384582
|
|
|
|
|
|
|
|
|
|
.. [9]
|
|
|
|
|
https://black.readthedocs.io/en/stable/
|
|
|
|
|
|
|
|
|
|
.. [10]
|
|
|
|
|
https://github.com/davidhalter/parso
|
|
|
|
|
|
|
|
|
|
.. [11]
|
|
|
|
|
https://github.com/Instagram/LibCST
|
|
|
|
|
|
2020-07-07 13:32:01 -04:00
|
|
|
|
.. [12]
|
|
|
|
|
https://mybinder.org
|
|
|
|
|
|
|
|
|
|
.. [13]
|
|
|
|
|
https://jupyter.org
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
.. _Appendix A:
|
|
|
|
|
|
|
|
|
|
Appendix A -- Full Grammar
|
|
|
|
|
==========================
|
|
|
|
|
|
|
|
|
|
Here is the full grammar for ``match_stmt``. This is an additional
|
|
|
|
|
alternative for ``compound_stmt``. It should be understood that
|
|
|
|
|
``match`` and ``case`` are soft keywords, i.e. they are not reserved
|
|
|
|
|
words in other grammatical contexts (including at the start of a line
|
|
|
|
|
if there is no colon where expected). By convention, hard keywords
|
|
|
|
|
use single quotes while soft keywords use double quotes.
|
|
|
|
|
|
|
|
|
|
Other notation used beyond standard EBNF:
|
|
|
|
|
|
|
|
|
|
- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*``
|
|
|
|
|
- ``!RULE`` is a negative lookahead assertion
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
2020-06-25 23:52:27 -04:00
|
|
|
|
match_expr:
|
|
|
|
|
| star_named_expression ',' star_named_expressions?
|
|
|
|
|
| named_expression
|
|
|
|
|
match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT
|
2020-06-23 11:27:36 -04:00
|
|
|
|
case_block: "case" patterns [guard] ':' block
|
|
|
|
|
guard: 'if' named_expression
|
|
|
|
|
patterns: value_pattern ',' [values_pattern] | pattern
|
2020-06-30 23:26:10 -04:00
|
|
|
|
pattern: walrus_pattern | or_pattern
|
|
|
|
|
walrus_pattern: NAME ':=' or_pattern
|
2020-06-23 11:27:36 -04:00
|
|
|
|
or_pattern: '|'.closed_pattern+
|
|
|
|
|
closed_pattern:
|
2020-06-29 15:11:20 -04:00
|
|
|
|
| capture_pattern
|
2020-06-23 11:27:36 -04:00
|
|
|
|
| literal_pattern
|
|
|
|
|
| constant_pattern
|
|
|
|
|
| group_pattern
|
|
|
|
|
| sequence_pattern
|
|
|
|
|
| mapping_pattern
|
|
|
|
|
| class_pattern
|
2020-06-29 15:11:20 -04:00
|
|
|
|
capture_pattern: NAME !('.' | '(' | '=')
|
2020-06-23 11:27:36 -04:00
|
|
|
|
literal_pattern:
|
|
|
|
|
| signed_number !('+' | '-')
|
|
|
|
|
| signed_number '+' NUMBER
|
|
|
|
|
| signed_number '-' NUMBER
|
|
|
|
|
| strings
|
|
|
|
|
| 'None'
|
|
|
|
|
| 'True'
|
|
|
|
|
| 'False'
|
2020-07-06 19:37:45 -04:00
|
|
|
|
constant_pattern: attr !('.' | '(' | '=')
|
2020-06-23 11:27:36 -04:00
|
|
|
|
group_pattern: '(' patterns ')'
|
|
|
|
|
sequence_pattern: '[' [values_pattern] ']' | '(' ')'
|
|
|
|
|
mapping_pattern: '{' items_pattern? '}'
|
|
|
|
|
class_pattern:
|
|
|
|
|
| name_or_attr '(' ')'
|
|
|
|
|
| name_or_attr '(' ','.pattern+ ','? ')'
|
|
|
|
|
| name_or_attr '(' ','.keyword_pattern+ ','? ')'
|
|
|
|
|
| name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'
|
|
|
|
|
signed_number: NUMBER | '-' NUMBER
|
|
|
|
|
attr: name_or_attr '.' NAME
|
|
|
|
|
name_or_attr: attr | NAME
|
|
|
|
|
values_pattern: ','.value_pattern+ ','?
|
|
|
|
|
items_pattern: ','.key_value_pattern+ ','?
|
|
|
|
|
keyword_pattern: NAME '=' or_pattern
|
2020-06-29 15:11:20 -04:00
|
|
|
|
value_pattern: '*' capture_pattern | pattern
|
2020-06-23 11:27:36 -04:00
|
|
|
|
key_value_pattern:
|
|
|
|
|
| (literal_pattern | constant_pattern) ':' or_pattern
|
2020-06-29 15:11:20 -04:00
|
|
|
|
| '**' capture_pattern
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document is placed in the public domain or under the
|
|
|
|
|
CC0-1.0-Universal license, whichever is more permissive.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|