2020-06-23 11:27:36 -04:00
|
|
|
|
PEP: 622
|
|
|
|
|
Title: Structural Pattern Matching
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Brandt Bucher <brandtbucher@gmail.com>,
|
|
|
|
|
Tobias Kohn <kohnt@tobiaskohn.ch>,
|
|
|
|
|
Ivan Levkivskyi <levkivskyi@gmail.com>,
|
|
|
|
|
Guido van Rossum <guido@python.org>,
|
|
|
|
|
Talin <viridia@gmail.com>
|
|
|
|
|
BDFL-Delegate:
|
|
|
|
|
Discussions-To: Python-Dev <python-dev@python.org>
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 23-Jun-2020
|
|
|
|
|
Python-Version: 3.10
|
|
|
|
|
Post-History: 23-Jun-2020
|
|
|
|
|
Resolution:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
This PEP proposes adding pattern matching statements [1]_ to Python in
|
|
|
|
|
order to create more expressive ways of handling structured
|
|
|
|
|
heterogeneous data. The authors take a holistic approach, providing
|
|
|
|
|
both static and runtime specifications.
|
|
|
|
|
|
|
|
|
|
:pep:`275` and :pep:`3103` previously proposed similar constructs, and
|
|
|
|
|
were rejected. Instead of targeting the optimization of
|
|
|
|
|
``if ... elif ... else`` statements (as those PEPs did), this design
|
|
|
|
|
focuses on generalizing sequence, mapping, and object destructuring.
|
2020-06-23 14:03:40 -04:00
|
|
|
|
It uses syntactic features made possible by :pep:`617`, which
|
2020-06-23 11:27:36 -04:00
|
|
|
|
introduced a more powerful method of parsing Python source code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale and Goals
|
|
|
|
|
===================
|
|
|
|
|
|
|
|
|
|
Let us start from some anecdotal evidence: ``isinstance()`` is one of the most
|
|
|
|
|
called functions in large scale Python code-bases (by static call count).
|
|
|
|
|
In particular, when analyzing some multi-million line production code base,
|
|
|
|
|
it was discovered that ``isinstance()`` is the second most called builtin
|
|
|
|
|
function (after ``len()``). Even taking into account builtin classes, it is
|
|
|
|
|
still in the top ten. Most of such calls are followed by specific attribute
|
|
|
|
|
access.
|
|
|
|
|
|
|
|
|
|
There are two possible conclusions that can be drawn from this information:
|
|
|
|
|
|
|
|
|
|
* Handling of heterogeneous data (i.e. situations where a variable can take
|
|
|
|
|
values of multiple types) is common in real world code.
|
|
|
|
|
|
|
|
|
|
* Python doesn't have expressive ways of destructuring object data (i.e.
|
|
|
|
|
separating the content of an object into multiple variables).
|
|
|
|
|
|
|
|
|
|
This is in contrast with the opposite sides of both aspects:
|
|
|
|
|
|
|
|
|
|
* Its success in the numeric world indicates that Python is good when
|
|
|
|
|
working with homogeneous data. It also has builtin support for homogeneous
|
|
|
|
|
data structures such as e.g. lists and arrays, and semantic constructs such
|
|
|
|
|
as iterators and generators.
|
|
|
|
|
|
|
|
|
|
* Python is expressive and flexible at constructing objects. It has syntactic
|
|
|
|
|
support for collection literals and comprehensions. Custom objects can be
|
|
|
|
|
created using positional and keyword calls that are customized by special
|
|
|
|
|
``__init__()`` method.
|
|
|
|
|
|
|
|
|
|
This PEP aims at improving the support for destructuring heterogeneous data
|
|
|
|
|
by adding a dedicated syntactic support for it in the form of pattern matching.
|
2020-06-23 14:03:40 -04:00
|
|
|
|
On a very high level it is similar to regular expressions, but instead of
|
2020-06-23 11:27:36 -04:00
|
|
|
|
matching strings, it will be possible to match arbitrary Python objects.
|
|
|
|
|
|
|
|
|
|
We believe this will improve both readability and reliability of relevant code.
|
|
|
|
|
To illustrate the readability improvement, let us consider an actual example
|
|
|
|
|
from the Python standard library::
|
|
|
|
|
|
|
|
|
|
def is_tuple(node):
|
|
|
|
|
if isinstance(node, Node) and node.children == [LParen(), RParen()]:
|
|
|
|
|
return True
|
|
|
|
|
return (isinstance(node, Node)
|
|
|
|
|
and len(node.children) == 3
|
|
|
|
|
and isinstance(node.children[0], Leaf)
|
|
|
|
|
and isinstance(node.children[1], Node)
|
|
|
|
|
and isinstance(node.children[2], Leaf)
|
|
|
|
|
and node.children[0].value == "("
|
|
|
|
|
and node.children[2].value == ")")
|
|
|
|
|
|
|
|
|
|
With the syntax proposed in this PEP it can be rewritten as below. Note that
|
|
|
|
|
the proposed code will work without any modifications to the definition of
|
|
|
|
|
``Node`` and other classes here::
|
|
|
|
|
|
|
|
|
|
def is_tuple(node: Node) -> bool:
|
|
|
|
|
match node:
|
|
|
|
|
case Node(children=[LParen(), RParen()]):
|
|
|
|
|
return True
|
|
|
|
|
case Node(children=[Leaf(value="("), Node(), Leaf(value=")")]):
|
|
|
|
|
return True
|
|
|
|
|
case _:
|
|
|
|
|
return False
|
|
|
|
|
|
|
|
|
|
See the `syntax`_ sections below for a more detailed specification.
|
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
We propose that destructuring
|
|
|
|
|
objects can be customized by a new special ``__match_args__``
|
|
|
|
|
attribute. As part of this PEP we specify the general API and its
|
|
|
|
|
implementation for some standard library classes (including named
|
|
|
|
|
tuples and dataclasses). See the `runtime`_ section below.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Finally, we aim to provide a comprehensive support for static type checkers
|
|
|
|
|
and similar tools. For this purpose we propose to introduce a
|
|
|
|
|
``@typing.sealed`` class decorator that will be a no-op at runtime, but
|
|
|
|
|
will indicate to static tools that all subclasses of this class must be defined
|
|
|
|
|
in the same module. This will allow effective static exhaustiveness checks,
|
|
|
|
|
and together with dataclasses, will provide a nice support for algebraic data
|
|
|
|
|
types [2]_. See the `static checkers`_ section for more details.
|
|
|
|
|
|
|
|
|
|
In general, we believe that pattern matching has been proved to be a useful and
|
|
|
|
|
expressive tool in various modern languages. In particular, many aspects of
|
|
|
|
|
this PEP were inspired by how pattern matching works in Rust [3]_ and
|
|
|
|
|
Scala [4]_.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _syntax:
|
|
|
|
|
|
|
|
|
|
Syntax and Semantics
|
|
|
|
|
====================
|
|
|
|
|
|
|
|
|
|
Case clauses
|
|
|
|
|
------------
|
|
|
|
|
|
|
|
|
|
A simplified, approximate grammar for the proposed syntax is::
|
|
|
|
|
|
|
|
|
|
...
|
|
|
|
|
compound_statement:
|
|
|
|
|
| if_stmt
|
|
|
|
|
...
|
|
|
|
|
| match_stmt
|
|
|
|
|
match_stmt: "match" expression ':' NEWLINE INDENT case_block+ DEDENT
|
|
|
|
|
case_block: "case" pattern [guard] ':' block
|
|
|
|
|
guard: 'if' expression
|
2020-06-30 23:26:10 -04:00
|
|
|
|
pattern: walrus_pattern | or_pattern
|
|
|
|
|
walrus_pattern: NAME ':=' or_pattern
|
2020-06-23 11:27:36 -04:00
|
|
|
|
or_pattern: closed_pattern ('|' closed_pattern)*
|
|
|
|
|
closed_pattern:
|
|
|
|
|
| literal_pattern
|
2020-06-29 15:11:20 -04:00
|
|
|
|
| capture_pattern
|
2020-06-23 11:27:36 -04:00
|
|
|
|
| constant_pattern
|
|
|
|
|
| sequence_pattern
|
|
|
|
|
| mapping_pattern
|
|
|
|
|
| class_pattern
|
|
|
|
|
|
|
|
|
|
(See `Appendix A`_ for the full, unabridged grammar.)
|
|
|
|
|
|
|
|
|
|
We propose the match syntax to be a statement, not an expression. Although in
|
|
|
|
|
many languages it is an expression, being a statement better suits the general
|
|
|
|
|
logic of Python syntax. See `rejected ideas`_ for more discussion. The list of
|
|
|
|
|
allowed patterns is specified below in the `patterns`_ subsection.
|
|
|
|
|
|
2020-06-23 19:28:16 -04:00
|
|
|
|
The ``match`` and ``case`` keywords are proposed to be soft keywords,
|
|
|
|
|
so that they are recognized as keywords at the beginning of a match
|
|
|
|
|
statement or case block respectively, but are allowed to be used in
|
|
|
|
|
other places as variable or argument names.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
The proposed indentation structure is as following::
|
|
|
|
|
|
|
|
|
|
match some_expression:
|
|
|
|
|
case pattern_1:
|
|
|
|
|
...
|
|
|
|
|
case pattern_2:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Match semantics
|
|
|
|
|
---------------
|
|
|
|
|
|
|
|
|
|
The proposed large scale semantics for choosing the match is to choose the first
|
|
|
|
|
matching pattern and execute the corresponding suite. The remaining patterns
|
2020-06-24 01:15:47 -04:00
|
|
|
|
are not tried. If there are no matching patterns, the statement 'falls
|
2020-06-23 11:27:36 -04:00
|
|
|
|
through', and execution continues at the following statement.
|
|
|
|
|
|
|
|
|
|
Essentially this is equivalent to a chain of ``if ... elif ... else``
|
|
|
|
|
statements. Note that unlike for the previously proposed ``switch`` statement,
|
|
|
|
|
the pre-computed dispatch dictionary semantics does not apply here.
|
|
|
|
|
|
|
|
|
|
There is no ``default`` or ``else`` case - instead the special wildcard
|
2020-06-29 15:11:20 -04:00
|
|
|
|
``_`` can be used (see the section on `capture_pattern`_) as a final
|
2020-06-23 11:27:36 -04:00
|
|
|
|
'catch-all' pattern.
|
|
|
|
|
|
|
|
|
|
Name bindings made during a successful pattern match outlive the executed suite
|
|
|
|
|
and can be used after the match statement. This follows the logic of other
|
|
|
|
|
Python statements that can bind names, such as ``for`` loop and ``with``
|
|
|
|
|
statement. For example::
|
|
|
|
|
|
|
|
|
|
match shape:
|
|
|
|
|
case Point(x, y):
|
|
|
|
|
...
|
|
|
|
|
case Rectangle(x, y, _, _):
|
|
|
|
|
...
|
|
|
|
|
print(x, y) # This works
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _patterns:
|
|
|
|
|
|
|
|
|
|
Allowed patterns
|
|
|
|
|
----------------
|
|
|
|
|
|
|
|
|
|
We introduce the proposed syntax gradually. Here we start from the main
|
|
|
|
|
building blocks. The following patterns are supported:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _literal_pattern:
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
Literal Patterns
|
|
|
|
|
~~~~~~~~~~~~~~~~
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
A literal pattern consists of a simple literal like a string, a number,
|
2020-06-24 20:06:42 -04:00
|
|
|
|
a Boolean literal (``True`` or ``False``), or ``None``::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
match number:
|
|
|
|
|
case 0:
|
|
|
|
|
print("Nothing")
|
|
|
|
|
case 1:
|
|
|
|
|
print("Just one")
|
|
|
|
|
case 2:
|
|
|
|
|
print("A couple")
|
|
|
|
|
case -1:
|
|
|
|
|
print("One less than nothing")
|
|
|
|
|
case 1-1j:
|
|
|
|
|
print("Good luck with that...")
|
|
|
|
|
|
|
|
|
|
Literal pattern uses equality with literal on the right hand side, so that
|
|
|
|
|
in the above example ``number == 1`` and then possibly ``number == 2`` will
|
|
|
|
|
be evaluated. Note that although technically negative numbers
|
|
|
|
|
are represented using unary minus, they are considered
|
|
|
|
|
literals for the purpose of pattern matching. Unary plus is not allowed.
|
|
|
|
|
Binary plus and minus are allowed only to join a real number and an imaginary
|
|
|
|
|
number to form a complex number, such as ``1+1j``.
|
|
|
|
|
|
2020-06-24 20:06:42 -04:00
|
|
|
|
Note that because equality (``__eq__``) is used, and the equivalency
|
|
|
|
|
between Booleans and the integers ``0`` and ``1``, there is no
|
|
|
|
|
practical difference between the following two::
|
|
|
|
|
|
|
|
|
|
case True:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
case 1:
|
|
|
|
|
...
|
|
|
|
|
|
2020-06-23 19:28:16 -04:00
|
|
|
|
Triple-quoted strings are supported. Raw strings and byte strings
|
2020-06-23 11:27:36 -04:00
|
|
|
|
are supported. F-strings are not allowed (since in general they are not
|
|
|
|
|
really literals).
|
|
|
|
|
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
.. _capture_pattern:
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
Capture Patterns
|
|
|
|
|
~~~~~~~~~~~~~~~~
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
A capture pattern serves as an assignment target for the matched expression::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
match greeting:
|
|
|
|
|
case "":
|
|
|
|
|
print("Hello!")
|
|
|
|
|
case name:
|
|
|
|
|
print(f"Hi {name}!")
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
A capture pattern always succeeds. A capture pattern appearing in a scope makes
|
2020-06-23 11:27:36 -04:00
|
|
|
|
the name local to that scope. For example, using ``name`` after the above
|
|
|
|
|
snippet may raise ``UnboundLocalError`` rather than ``NameError``, if
|
|
|
|
|
the ``""`` case clause was taken::
|
|
|
|
|
|
|
|
|
|
match greeting:
|
|
|
|
|
case "":
|
|
|
|
|
print("Hello!")
|
|
|
|
|
case name:
|
|
|
|
|
print(f"Hi {name}!")
|
|
|
|
|
if name == "Santa": # <-- might raise UnboundLocalError
|
|
|
|
|
... # but works fine if greeting was not empty
|
|
|
|
|
|
|
|
|
|
While matching against each case clause, a name may be bound at most
|
2020-06-29 15:11:20 -04:00
|
|
|
|
once, having two capture patterns with coinciding names is an error. An
|
2020-06-23 11:27:36 -04:00
|
|
|
|
exception is made for the special single underscore (``_``) name; in
|
|
|
|
|
patterns, it's a wildcard that *never* binds::
|
|
|
|
|
|
|
|
|
|
match data:
|
|
|
|
|
case [x, x]: # Error!
|
|
|
|
|
...
|
|
|
|
|
case [_, _]:
|
|
|
|
|
print("Some pair")
|
|
|
|
|
print(_) # Error!
|
|
|
|
|
|
|
|
|
|
Note: one can still match on a collection with equal items using `guards`_.
|
|
|
|
|
Also, ``[x, y] | Point(x, y)`` is a legal pattern because the two
|
|
|
|
|
alternatives are never matched at the same time.
|
|
|
|
|
|
2020-06-24 21:20:19 -04:00
|
|
|
|
Reminder: ``None``, ``False`` and ``True`` are keywords denoting
|
|
|
|
|
literals, not names.
|
2020-06-24 20:06:42 -04:00
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
.. _constant_value_pattern:
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
Constant Value Patterns
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
This is used to match against constants and enum values.
|
|
|
|
|
Every dotted name in a pattern is looked up using normal Python name
|
|
|
|
|
resolution rules, and the value is used for comparison by equality with
|
|
|
|
|
the matching expression (same as for literals). As a special case to avoid
|
2020-06-29 15:11:20 -04:00
|
|
|
|
ambiguity with capture patterns, simple names must be prefixed with a dot to be
|
2020-06-23 11:27:36 -04:00
|
|
|
|
considered a reference::
|
|
|
|
|
|
|
|
|
|
from enum import Enum
|
|
|
|
|
|
|
|
|
|
class Color(Enum):
|
|
|
|
|
BLACK = 1
|
|
|
|
|
RED = 2
|
|
|
|
|
|
|
|
|
|
BLACK = 1
|
|
|
|
|
RED = 2
|
|
|
|
|
|
|
|
|
|
match color:
|
|
|
|
|
case .BLACK | Color.BLACK:
|
|
|
|
|
print("Black suits every color")
|
|
|
|
|
case BLACK: # This will just assign a new value to BLACK.
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
The leading dot can be omitted if the name is already dotted, but
|
|
|
|
|
adding it is not prohibited, so ``.Color.BLACK`` is the same as ``Color.BLACK``.
|
|
|
|
|
See `rejected ideas`_ for other syntactic alternatives that were considered
|
|
|
|
|
for constant value pattern.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _sequence_pattern:
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
Sequence Patterns
|
|
|
|
|
~~~~~~~~~~~~~~~~~
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
A sequence pattern follows the same semantics as unpacking assignment.
|
|
|
|
|
Like unpacking assignment, both tuple-like and list-like syntax can be
|
|
|
|
|
used, with identical semantics. Each element can be an arbitrary
|
|
|
|
|
pattern; there may also be at most one ``*name`` pattern to catch all
|
|
|
|
|
remaining items::
|
|
|
|
|
|
|
|
|
|
match collection:
|
|
|
|
|
case 1, [x, *others]:
|
|
|
|
|
print("Got 1 and a nested sequence")
|
|
|
|
|
case (1, x):
|
|
|
|
|
print(f"Got 1 and {x}")
|
|
|
|
|
|
|
|
|
|
To match a sequence pattern the target must be an instance of
|
|
|
|
|
``collections.abc.Sequence``, and it cannot be any kind of string
|
|
|
|
|
(``str``, ``bytes``, ``bytearray``). It cannot be an iterator. For matching
|
|
|
|
|
on a specific collection class, see class pattern below.
|
|
|
|
|
|
|
|
|
|
The ``_`` wildcard can be starred to match sequences of varying lengths. For
|
|
|
|
|
example:
|
|
|
|
|
|
|
|
|
|
* ``[*_]`` matches a sequence of any length.
|
|
|
|
|
* ``(_, _, *_)``, matches any sequence of length two or more.
|
|
|
|
|
* ``["a", *_, "z"]`` matches any sequence of length two or more that starts with
|
|
|
|
|
``"a"`` and ends with ``"z"``.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _mapping_pattern:
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
Mapping Patterns
|
|
|
|
|
~~~~~~~~~~~~~~~~
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Mapping pattern is a generalization of iterable unpacking to mappings.
|
|
|
|
|
Its syntax is similar to dictionary display but each key and value are
|
|
|
|
|
patterns ``"{" (pattern ":" pattern)+ "}"``. A ``**name`` pattern is also
|
|
|
|
|
allowed, to extract the remaining items. Only literal and constant value
|
|
|
|
|
patterns are allowed in key positions::
|
|
|
|
|
|
|
|
|
|
import constants
|
|
|
|
|
|
|
|
|
|
match config:
|
|
|
|
|
case {"route": route}:
|
|
|
|
|
process_route(route)
|
|
|
|
|
case {constants.DEFAULT_PORT: sub_config, **rest}:
|
|
|
|
|
process_config(sub_config, rest)
|
|
|
|
|
|
|
|
|
|
The target must be an instance of ``collections.abc.Mapping``.
|
|
|
|
|
Extra keys in the target are ignored even if ``**rest`` is not present.
|
|
|
|
|
This is different from sequence pattern, where extra items will cause a
|
|
|
|
|
match to fail. But mappings are actually different from sequences: they
|
|
|
|
|
have natural structural sub-typing behavior, i.e., passing a dictionary
|
|
|
|
|
with extra keys somewhere will likely just work.
|
|
|
|
|
|
|
|
|
|
For this reason, ``**_`` is invalid in mapping patterns; it would always be a
|
|
|
|
|
no-op that could be removed without consequence.
|
|
|
|
|
|
|
|
|
|
Matched key-value pairs must already be present in the mapping, and not created
|
|
|
|
|
on-the-fly by ``__missing__`` or ``__getitem__``. For example,
|
|
|
|
|
``collections.defaultdict`` instances will only match patterns with keys that
|
|
|
|
|
were already present when the ``match`` block was entered.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _class_pattern:
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
Class Patterns
|
|
|
|
|
~~~~~~~~~~~~~~
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
A class pattern provides support for destructuring arbitrary objects.
|
|
|
|
|
There are two possible ways of matching on object attributes: by position
|
2020-06-24 20:04:34 -04:00
|
|
|
|
like ``Point(1, 2)``, and by name like ``Point(x=1, y=2)``. These
|
2020-06-23 11:27:36 -04:00
|
|
|
|
two can be combined, but positional match cannot follow a match by name.
|
|
|
|
|
Each item in a class pattern can be an arbitrary pattern. A simple
|
|
|
|
|
example::
|
|
|
|
|
|
|
|
|
|
match shape:
|
|
|
|
|
case Point(x, y):
|
|
|
|
|
...
|
|
|
|
|
case Rectangle(x0, y0, x1, y1, painted=True):
|
|
|
|
|
...
|
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
Whether a match succeeds or not is determined by the equivalent of an
|
|
|
|
|
``isinstance`` call. If the target (``shape``, in the example) is not
|
|
|
|
|
an instance of the named class (``Point`` or ``Rectangle``), the match
|
|
|
|
|
fails. Otherwise, it continues (see details in the `runtime`_
|
|
|
|
|
section).
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
The named class must inherit from ``type``. It may be a single name
|
|
|
|
|
or a dotted name (e.g. ``some_mod.SomeClass`` or ``mod.pkg.Class``).
|
|
|
|
|
The leading name must not be ``_``, so e.g. ``_(...)`` and
|
|
|
|
|
``_.C(...)`` are invalid. Use ``object(foo=_)`` to check whether the
|
|
|
|
|
matched object has an attribute ``foo``.
|
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
By default, sub-patterns may only be matched by keyword for
|
|
|
|
|
user-defined classes. In order to suport positional sub-patterns, a
|
|
|
|
|
custom ``__match_args__`` attribute is required.
|
|
|
|
|
The runtime allows matching against
|
|
|
|
|
arbitrarily nested patterns by chaining all of the instance checks and
|
|
|
|
|
attribute lookups appropriately.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
2020-07-02 01:11:53 -04:00
|
|
|
|
Combining multiple patterns (OR patterns)
|
|
|
|
|
-----------------------------------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Multiple alternative patterns can be combined into one using ``|``. This means
|
2020-06-24 20:25:13 -04:00
|
|
|
|
the whole pattern matches if at least one alternative matches.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
Alternatives are tried from left to right and have short-circuit property,
|
|
|
|
|
subsequent patterns are not tried if one matched. Examples::
|
|
|
|
|
|
|
|
|
|
match something:
|
|
|
|
|
case 0 | 1 | 2:
|
|
|
|
|
print("Small number")
|
|
|
|
|
case [] | [_]:
|
|
|
|
|
print("A short sequence")
|
|
|
|
|
case str() | bytes():
|
|
|
|
|
print("Something string-like")
|
|
|
|
|
case _:
|
|
|
|
|
print("Something else")
|
|
|
|
|
|
|
|
|
|
The alternatives may bind variables, as long as each alternative binds
|
|
|
|
|
the same set of variables (excluding ``_``). For example::
|
|
|
|
|
|
|
|
|
|
match something:
|
|
|
|
|
case 1 | x: # Error!
|
|
|
|
|
...
|
|
|
|
|
case x | 1: # Error!
|
|
|
|
|
...
|
|
|
|
|
case one := [1] | two := [2]: # Error!
|
|
|
|
|
...
|
|
|
|
|
case Foo(arg=x) | Bar(arg=x): # Valid, both arms bind 'x'
|
|
|
|
|
...
|
|
|
|
|
case [x] | x: # Valid, both arms bind 'x'
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _guards:
|
|
|
|
|
|
|
|
|
|
Guards
|
|
|
|
|
------
|
|
|
|
|
|
|
|
|
|
Each *top-level* pattern can be followed by a guard of the form
|
|
|
|
|
``if expression``. A case clause succeeds if the pattern matches and the guard
|
2020-06-23 18:20:24 -04:00
|
|
|
|
evaluates to a true value. For example::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
match input:
|
|
|
|
|
case [x, y] if x > MAX_INT and y > MAX_INT:
|
|
|
|
|
print("Got a pair of large numbers")
|
|
|
|
|
case x if x > MAX_INT:
|
|
|
|
|
print("Got a large number")
|
|
|
|
|
case [x, y] if x == y:
|
|
|
|
|
print("Got equal items")
|
|
|
|
|
case _:
|
|
|
|
|
print("Not an outstanding input")
|
|
|
|
|
|
|
|
|
|
If evaluating a guard raises an exception, it is propagated onwards rather
|
|
|
|
|
than fail the case clause. Names that appear in a pattern are bound before the
|
|
|
|
|
guard succeeds. So this will work::
|
|
|
|
|
|
|
|
|
|
values = [0]
|
|
|
|
|
|
2020-06-26 20:55:59 -04:00
|
|
|
|
match values:
|
2020-06-23 11:27:36 -04:00
|
|
|
|
case [x] if x:
|
|
|
|
|
... # This is not executed
|
|
|
|
|
case _:
|
|
|
|
|
...
|
|
|
|
|
print(x) # This will print "0"
|
|
|
|
|
|
|
|
|
|
Note that guards are not allowed for nested patterns, so that ``[x if x > 0]``
|
|
|
|
|
is a ``SyntaxError`` and ``1 | 2 if 3 | 4`` will be parsed as
|
|
|
|
|
``(1 | 2) if (3 | 4)``.
|
|
|
|
|
|
|
|
|
|
|
2020-06-30 23:26:10 -04:00
|
|
|
|
Walrus patterns
|
|
|
|
|
---------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-06-30 23:26:10 -04:00
|
|
|
|
It is often useful to match a sub-pattern *and* bind the corresponding
|
2020-06-23 11:27:36 -04:00
|
|
|
|
value to a name. For example, it can be useful to write more efficient
|
2020-06-30 23:26:10 -04:00
|
|
|
|
matches, or simply to avoid repetition. To simplify such cases, any pattern
|
|
|
|
|
(other than the walrus pattern itself) can be preceded by a name and
|
|
|
|
|
the walrus operator (``:=``). For example::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
match get_shape():
|
|
|
|
|
case Line(start := Point(x, y), end) if start == end:
|
|
|
|
|
print(f"Zero length line at {x}, {y}")
|
|
|
|
|
|
2020-06-30 23:26:10 -04:00
|
|
|
|
The name on the left of the walrus operator can be used in a guard, in
|
2020-06-23 11:27:36 -04:00
|
|
|
|
the match suite, or after the match statement. However, the name will
|
|
|
|
|
*only* be bound if the sub-pattern succeeds. Another example::
|
|
|
|
|
|
|
|
|
|
match group_shapes():
|
|
|
|
|
case [], [point := Point(x, y), *other]:
|
|
|
|
|
print(f"Got {point} in the second group")
|
|
|
|
|
process_coordinates(x, y)
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
Technically, most such examples can be rewritten using guards and/or nested
|
|
|
|
|
match statements, but this will be less readable and/or will produce less
|
|
|
|
|
efficient code. Essentially, most of the arguments in PEP 572 apply here
|
|
|
|
|
equally.
|
|
|
|
|
|
|
|
|
|
``_`` is not a valid name here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _runtime:
|
|
|
|
|
|
|
|
|
|
Runtime specification
|
|
|
|
|
=====================
|
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
The Match Protocol
|
|
|
|
|
------------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
The equivalent of an ``isinstance`` call is used to decide whether an
|
|
|
|
|
object matches a given class pattern and to extract the corresponding
|
|
|
|
|
attributes. Classes requiring different matching semantics (such as
|
|
|
|
|
duck-typing) can do so by defining ``__instancecheck__`` (a
|
|
|
|
|
pre-existing metaclass hook) or by using ``typing.Protocol``.
|
2020-06-23 19:28:16 -04:00
|
|
|
|
|
|
|
|
|
The procedure is as following:
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
* The class object for ``Class`` in ``Class(<sub-patterns>)`` is
|
|
|
|
|
looked up and ``isinstance(obj, Class)`` is called, where ``obj`` is
|
|
|
|
|
the value being matched. If false, the match fails.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
* Otherwise, if any sub-patterns are given in the form of positional
|
|
|
|
|
or keyword arguments, these are matched from left to right, as
|
|
|
|
|
follows. The match fails as soon as a sub-pattern fails; if all
|
|
|
|
|
sub-patterns succeed, the overall class pattern match succeeds.
|
|
|
|
|
|
|
|
|
|
* If there are match-by-position items and the class has a
|
2020-06-26 20:55:59 -04:00
|
|
|
|
``__match_args__``, the item at position ``i``
|
2020-06-23 11:27:36 -04:00
|
|
|
|
is matched against the value looked up by attribute
|
|
|
|
|
``__match_args__[i]``. For example, a pattern ``Point2D(5, 8)``,
|
|
|
|
|
where ``Point2D.__match_args__ == ["x", "y"]``, is translated
|
|
|
|
|
(approximately) into ``obj.x == 5 and obj.y == 8``.
|
|
|
|
|
|
2020-06-29 14:54:42 -04:00
|
|
|
|
* If there are more positional items than the length of
|
|
|
|
|
``__match_args__``, a ``TypeError`` is raised.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-06-26 20:55:59 -04:00
|
|
|
|
* If the ``__match_args__`` attribute is absent on the matched class,
|
2020-06-29 12:32:21 -04:00
|
|
|
|
and one or more positional item appears in a match,
|
2020-06-29 14:54:42 -04:00
|
|
|
|
``TypeError`` is also raised. We don't fall back on
|
2020-06-23 11:27:36 -04:00
|
|
|
|
using ``__slots__`` or ``__annotations__`` -- "In the face of ambiguity,
|
|
|
|
|
refuse the temptation to guess."
|
|
|
|
|
|
|
|
|
|
* If there are any match-by-keyword items the keywords are looked up
|
2020-07-01 11:37:47 -04:00
|
|
|
|
as attributes on the target. If the lookup succeeds the value is
|
2020-06-23 11:27:36 -04:00
|
|
|
|
matched against the corresponding sub-pattern. If the lookup fails,
|
2020-06-29 12:32:21 -04:00
|
|
|
|
the match fails.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Such a protocol favors simplicity of implementation over flexibility and
|
2020-06-28 23:30:08 -04:00
|
|
|
|
performance. For other considered alternatives, see `extended matching`_.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-06-27 15:22:24 -04:00
|
|
|
|
For the most commonly-matched built-in types (``bool``,
|
|
|
|
|
``bytearray``, ``bytes``, ``dict``, ``float``,
|
|
|
|
|
``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``), a
|
|
|
|
|
single positional sub-pattern is allowed to be passed to
|
2020-06-26 20:55:59 -04:00
|
|
|
|
the call. Rather than being matched against any particular attribute
|
2020-07-01 11:37:47 -04:00
|
|
|
|
on the target, it is instead matched against the target itself. This
|
2020-06-26 20:55:59 -04:00
|
|
|
|
creates behavior that is useful and intuitive for these objects:
|
|
|
|
|
|
|
|
|
|
* ``bool(False)`` matches ``False`` (but not ``0``).
|
|
|
|
|
* ``tuple((0, 1, 2))`` matches ``(0, 1, 2)`` (but not ``[0, 1, 2]``).
|
|
|
|
|
* ``int(i)`` matches any ``int`` and binds it to the name ``i``.
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Ambiguous matches
|
|
|
|
|
-----------------
|
|
|
|
|
|
2020-06-29 14:54:42 -04:00
|
|
|
|
Certain classes of impossible and ambiguous matches are detected at
|
|
|
|
|
runtime and will raise exceptions. In addition to basic checks
|
|
|
|
|
described in the previous subsection:
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
* The interpreter will check that two match items are not targeting the same
|
|
|
|
|
attribute, for example ``Point2D(1, 2, y=3)`` is an error.
|
|
|
|
|
|
2020-06-29 14:54:42 -04:00
|
|
|
|
* It will also check that a mapping pattern does not attempt to match
|
|
|
|
|
the same key more than once.
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Special attribute ``__match_args__``
|
|
|
|
|
------------------------------------
|
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
The ``__match_args__`` attribute is always looked up on the type
|
|
|
|
|
object named in the pattern. If present, it must be a list or tuple
|
|
|
|
|
of strings naming the allowed positional arguments.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
In deciding what names should be available for matching, the
|
|
|
|
|
recommended practice is that class patterns should be the mirror of
|
|
|
|
|
construction; that is, the set of available names and their types
|
|
|
|
|
should resemble the arguments to ``__init__()``.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
Only match-by-name will work by default, and classes should define
|
|
|
|
|
``__match_args__`` as a class attribute if they would like to support
|
|
|
|
|
match-by-position. Additionally, dataclasses and named tuples will
|
|
|
|
|
support match-by-position out of the box. See below for more details.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
2020-07-02 17:47:45 -04:00
|
|
|
|
Exception semantics
|
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
|
|
While matching each case, the ``match`` statement may trigger execution of other
|
|
|
|
|
functions (for example ``__getitem__()``, ``__len__()`` or
|
|
|
|
|
a property). Almost every exception caused by those propagates outside of the
|
|
|
|
|
match statement normally. The only case where an exception is not propagated is
|
|
|
|
|
an ``AttributeError`` raised while trying to lookup an attribute while matching
|
|
|
|
|
attributes of a Class Pattern; that case results in just a matching failure,
|
|
|
|
|
and the rest of the statement proceeds normally.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
The standard library
|
2020-06-23 19:28:16 -04:00
|
|
|
|
--------------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
To facilitate the use of pattern matching, several changes will be made to
|
|
|
|
|
the standard library:
|
|
|
|
|
|
|
|
|
|
* Namedtuples and dataclasses will have auto-generated ``__match_args__``.
|
|
|
|
|
|
|
|
|
|
* For dataclasses the order of attributes in the generated ``__match_args__``
|
|
|
|
|
will be the same as the order of corresponding arguments in the generated
|
|
|
|
|
``__init__()`` method. This includes the situations where attributes are
|
|
|
|
|
inherited from a superclass.
|
|
|
|
|
|
2020-07-01 11:37:47 -04:00
|
|
|
|
In addition, a systematic effort will be put into going through
|
|
|
|
|
existing standard library classes and adding ``__match_args__`` where
|
|
|
|
|
it looks beneficial.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _static checkers:
|
|
|
|
|
|
|
|
|
|
Static checkers specification
|
|
|
|
|
=============================
|
|
|
|
|
|
|
|
|
|
Exhaustiveness checks
|
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
From a reliability perspective, experience shows that missing a case when
|
|
|
|
|
dealing with a set of possible data values leads to hard to debug issues,
|
|
|
|
|
thus forcing people to add safety asserts like this::
|
|
|
|
|
|
|
|
|
|
def get_first(data: Union[int, list[int]]) -> int:
|
|
|
|
|
if isinstance(data, list) and data:
|
|
|
|
|
return data[0]
|
|
|
|
|
elif isinstance(data, int):
|
|
|
|
|
return data
|
|
|
|
|
else:
|
|
|
|
|
assert False, "should never get here"
|
|
|
|
|
|
|
|
|
|
PEP 484 specifies that static type checkers should support exhaustiveness in
|
|
|
|
|
conditional checks with respect to enum values. PEP 586 later generalized this
|
|
|
|
|
requirement to literal types.
|
|
|
|
|
|
|
|
|
|
This PEP further generalizes this requirement to
|
|
|
|
|
arbitrary patterns. A typical situation where this applies is matching an
|
|
|
|
|
expression with a union type::
|
|
|
|
|
|
|
|
|
|
def classify(val: Union[int, Tuple[int, int], List[int]]) -> str:
|
|
|
|
|
match val:
|
|
|
|
|
case [x, *other]:
|
|
|
|
|
return f"A sequence starting with {x}"
|
|
|
|
|
case [x, y] if x > 0 and y > 0:
|
|
|
|
|
return f"A pair of {x} and {y}"
|
|
|
|
|
case int():
|
|
|
|
|
return f"Some integer"
|
|
|
|
|
# Type-checking error: some cases unhandled.
|
|
|
|
|
|
|
|
|
|
The exhaustiveness checks should also apply where both pattern matching
|
|
|
|
|
and enum values are combined::
|
|
|
|
|
|
|
|
|
|
from enum import Enum
|
|
|
|
|
from typing import Union
|
|
|
|
|
|
|
|
|
|
class Level(Enum):
|
|
|
|
|
BASIC = 1
|
|
|
|
|
ADVANCED = 2
|
|
|
|
|
PRO = 3
|
|
|
|
|
|
|
|
|
|
class User:
|
|
|
|
|
name: str
|
|
|
|
|
level: Level
|
|
|
|
|
|
|
|
|
|
class Admin:
|
|
|
|
|
name: str
|
|
|
|
|
|
|
|
|
|
account: Union[User, Admin]
|
|
|
|
|
|
|
|
|
|
match account:
|
|
|
|
|
case Admin(name=name) | User(name=name, level=Level.PRO):
|
|
|
|
|
...
|
|
|
|
|
case User(level=Level.ADVANCED):
|
|
|
|
|
...
|
|
|
|
|
# Type-checking error: basic user unhandled
|
|
|
|
|
|
|
|
|
|
Obviously, no ``Matchable`` protocol (in terms of PEP 544) is needed, since
|
|
|
|
|
every class is matchable and therefore is subject to the checks specified
|
|
|
|
|
above.
|
|
|
|
|
|
|
|
|
|
|
2020-06-24 21:14:29 -04:00
|
|
|
|
Sealed classes as algebraic data types
|
|
|
|
|
--------------------------------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Quite often it is desirable to apply exhaustiveness to a set of classes without
|
|
|
|
|
defining ad-hoc union types, which is itself fragile if a class is missing in
|
|
|
|
|
the union definition. A design pattern where a group of record-like classes is
|
|
|
|
|
combined into a union is popular in other languages that support pattern
|
2020-06-24 21:14:29 -04:00
|
|
|
|
matching and is known under a name of algebraic data types [2]_.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
We propose to add a special decorator class ``@sealed`` to the ``typing``
|
|
|
|
|
module [6]_, that will have no effect at runtime, but will indicate to static
|
|
|
|
|
type checkers that all subclasses (direct and indirect) of this class should
|
|
|
|
|
be defined in the same module as the base class.
|
|
|
|
|
|
|
|
|
|
The idea is that since all subclasses are known, the type checker can treat
|
|
|
|
|
the sealed base class as a union of all its subclasses. Together with
|
2020-06-24 21:14:29 -04:00
|
|
|
|
dataclasses this allows a clean and safe support of algebraic data types
|
|
|
|
|
in Python. Consider this example::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
from dataclasses import dataclass
|
|
|
|
|
from typing import sealed
|
|
|
|
|
|
|
|
|
|
@sealed
|
|
|
|
|
class Node:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
class Expression(Node):
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
class Statement(Node):
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
@dataclass
|
|
|
|
|
class Name(Expression):
|
|
|
|
|
name: str
|
|
|
|
|
|
|
|
|
|
@dataclass
|
|
|
|
|
class Operation(Expression):
|
|
|
|
|
left: Expression
|
|
|
|
|
op: str
|
|
|
|
|
right: Expression
|
|
|
|
|
|
|
|
|
|
@dataclass
|
|
|
|
|
class Assignment(Statement):
|
|
|
|
|
target: str
|
|
|
|
|
value: Expression
|
|
|
|
|
|
2020-06-23 18:20:24 -04:00
|
|
|
|
@dataclass
|
2020-06-23 11:27:36 -04:00
|
|
|
|
class Print(Statement):
|
|
|
|
|
value: Expression
|
|
|
|
|
|
|
|
|
|
With such definition, a type checker can safely treat ``Node`` as
|
|
|
|
|
``Union[Name, Operation, Assignment, Print]``, and also safely treat e.g.
|
|
|
|
|
``Expression`` as ``Union[Name, Operation]``. So this will result in a type
|
|
|
|
|
checking error in the below snippet, because ``Name`` is not handled (and type
|
|
|
|
|
checker can give a useful error message)::
|
|
|
|
|
|
|
|
|
|
def dump(node: Node) -> str:
|
|
|
|
|
match node:
|
|
|
|
|
case Assignment(target, value):
|
|
|
|
|
return f"{target} = {dump(value)}"
|
|
|
|
|
case Print(value):
|
|
|
|
|
return f"print({dump(value)})"
|
|
|
|
|
case Operation(left, op, right):
|
|
|
|
|
return f"({dump(left)} {op} {dump(right)})"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Type erasure
|
|
|
|
|
------------
|
|
|
|
|
|
|
|
|
|
Class patterns are subject to runtime type erasure. Namely, although one
|
|
|
|
|
can define a type alias ``IntQueue = Queue[int]`` so that a pattern like
|
2020-06-23 18:20:24 -04:00
|
|
|
|
``IntQueue()`` is syntactically valid, type checkers should reject such a
|
2020-06-23 11:27:36 -04:00
|
|
|
|
match::
|
|
|
|
|
|
|
|
|
|
queue: Union[Queue[int], Queue[str]]
|
|
|
|
|
match queue:
|
|
|
|
|
case IntQueue(): # Type-checking error here
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
Note that the above snippet actually fails at runtime with the current
|
|
|
|
|
implementation of generic classes in the ``typing`` module, as well as
|
|
|
|
|
with builtin generic classes in the recently accepted PEP 585, because
|
|
|
|
|
they prohibit ``isinstance`` checks.
|
|
|
|
|
|
|
|
|
|
To clarify, generic classes are not prohibited in general from participating
|
|
|
|
|
in pattern matching, just that their type parameters can't be explicitly
|
|
|
|
|
specified. It is still fine if sub-patterns or literals bind the type
|
|
|
|
|
variables. For example::
|
|
|
|
|
|
|
|
|
|
from typing import Generic, TypeVar, Union
|
|
|
|
|
|
|
|
|
|
T = TypeVar('T')
|
|
|
|
|
|
|
|
|
|
class Result(Generic[T]):
|
|
|
|
|
first: T
|
|
|
|
|
other: list[T]
|
|
|
|
|
|
|
|
|
|
result: Union[Result[int], Result[str]]
|
|
|
|
|
|
|
|
|
|
match result:
|
|
|
|
|
case Result(first=int()):
|
|
|
|
|
... # Type of result is Result[int] here
|
|
|
|
|
case Result(other=["foo", "bar", *rest]):
|
|
|
|
|
... # Type of result is Result[str] here
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note about constants
|
|
|
|
|
--------------------
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
The fact that a capture pattern is always an assignment target may create unwanted
|
2020-06-23 11:27:36 -04:00
|
|
|
|
consequences when a user by mistake tries to "match" a value against
|
|
|
|
|
a constant instead of using the constant value pattern. As a result, at
|
|
|
|
|
runtime such match will always succeed and moreover override the value of
|
|
|
|
|
the constant. It is important therefore that static type checkers warn about
|
|
|
|
|
such situations. For example::
|
|
|
|
|
|
|
|
|
|
from typing import Final
|
|
|
|
|
|
|
|
|
|
MAX_INT: Final = 2 ** 64
|
|
|
|
|
|
|
|
|
|
value = 0
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case MAX_INT: # Type-checking error here: cannot assign to final name
|
|
|
|
|
print("Got big number")
|
|
|
|
|
case .MAX_INT: # This is OK
|
|
|
|
|
print("Got big number")
|
|
|
|
|
case _:
|
|
|
|
|
print("Something else")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Precise type checking of star matches
|
|
|
|
|
-------------------------------------
|
|
|
|
|
|
|
|
|
|
Type checkers should perform precise type checking of star items in pattern
|
|
|
|
|
matching giving them either a heterogeneous ``list[T]`` type, or
|
|
|
|
|
a ``TypedDict`` type as specified by PEP 589. For example::
|
|
|
|
|
|
|
|
|
|
stuff: Tuple[int, str, str, float]
|
|
|
|
|
|
|
|
|
|
match stuff:
|
|
|
|
|
case a, *b, 0.5:
|
|
|
|
|
# Here a is int and b is list[str]
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Performance Considerations
|
|
|
|
|
==========================
|
|
|
|
|
|
|
|
|
|
Ideally, a ``match`` statement should have good runtime performance compared
|
|
|
|
|
to an equivalent chain of if-statements. Although the history of programming
|
|
|
|
|
languages is rife with examples of new features which increased engineer
|
|
|
|
|
productivity at the expense of additional CPU cycles, it would be
|
|
|
|
|
unfortunate if the benefits of ``match`` were counter-balanced by a significant
|
|
|
|
|
overall decrease in runtime performance.
|
|
|
|
|
|
|
|
|
|
Although this PEP does not specify any particular implementation strategy,
|
|
|
|
|
a few words about the prototype implementation and how it attempts to
|
|
|
|
|
maximize performance are in order.
|
|
|
|
|
|
|
|
|
|
Basically, the prototype implementation transforms all of the ``match``
|
|
|
|
|
statement syntax into equivalent if/else blocks - or more accurately, into
|
|
|
|
|
Python byte codes that have the same effect. In other words, all of the
|
|
|
|
|
logic for testing instance types, sequence lengths, mapping keys and
|
|
|
|
|
so on are inlined in place of the ``match``.
|
|
|
|
|
|
|
|
|
|
This is not the only possible strategy, nor is it necessarily the best.
|
2020-07-01 11:37:47 -04:00
|
|
|
|
For example, the instance checks could be memoized, especially
|
2020-06-23 11:27:36 -04:00
|
|
|
|
if there are multiple instances of the same class type but with different
|
|
|
|
|
arguments in a single match statement. It is also theoretically
|
|
|
|
|
possible for a future implementation to process the case clauses in
|
|
|
|
|
parallel using a decision tree rather than testing them one by one.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Backwards Compatibility
|
|
|
|
|
=======================
|
|
|
|
|
|
|
|
|
|
This PEP is fully backwards compatible: the ``match`` and ``case``
|
|
|
|
|
keywords are proposed to be (and stay!) soft keywords, so their use as
|
|
|
|
|
variable, function, class, module or attribute names is not impeded at
|
|
|
|
|
all.
|
|
|
|
|
|
|
|
|
|
This is important because ``match`` is the name of a popular and
|
|
|
|
|
well-known function and method in the ``re`` module, which we have no
|
|
|
|
|
desire to break or deprecate.
|
|
|
|
|
|
|
|
|
|
The difference between hard and soft keywords is that hard keywords
|
|
|
|
|
are *always* reserved words, even in positions where they make no
|
|
|
|
|
sense (e.g. ``x = class + 1``), while soft keywords only get a special
|
|
|
|
|
meaning in context. Since our parser backtracks, that means that on
|
|
|
|
|
different attempts to parse a code fragment it could interpret a soft
|
|
|
|
|
keyword differently.
|
|
|
|
|
|
|
|
|
|
For example, suppose the parser encounters the following input::
|
|
|
|
|
|
|
|
|
|
match [x, y]:
|
|
|
|
|
|
|
|
|
|
The parser first attempts to parse this as an expression statement.
|
|
|
|
|
It interprets ``match`` as a NAME token, and then considers ``[x,
|
|
|
|
|
y]`` to be a double subscript. It then encounters the colon and has
|
|
|
|
|
to backtrack, since an expression statement cannot be followed by a
|
|
|
|
|
colon. The parser then backtracks to the start of the line and finds
|
|
|
|
|
that ``match`` is a soft keyword allowed in this position. It then
|
|
|
|
|
considers ``[x, y]`` to be a list expression. The colon then is just
|
|
|
|
|
what the parser expected, and the parse succeeds.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Impacts on third-party tools
|
|
|
|
|
============================
|
|
|
|
|
|
|
|
|
|
There are a lot of tools in the Python ecosystem that operate on Python
|
|
|
|
|
source code: linters, syntax highlighters, auto-formatters, and IDEs. These
|
|
|
|
|
will all need to be updated to include awareness of the ``match`` statement.
|
|
|
|
|
|
|
|
|
|
In general, these tools fall into one of two categories:
|
|
|
|
|
|
|
|
|
|
**Shallow** parsers don't try to understand the full syntax of Python, but
|
|
|
|
|
instead scan the source code for specific known patterns. IDEs, such as Visual
|
|
|
|
|
Studio Code, Emacs and TextMate, tend to fall in this category, since frequently
|
|
|
|
|
the source code is invalid while being edited, and a strict approach to parsing
|
|
|
|
|
would fail.
|
|
|
|
|
|
|
|
|
|
For these kinds of tools, adding knowledge of a new keyword is relatively
|
|
|
|
|
easy, just an addition to a table, or perhaps modification of a regular
|
|
|
|
|
expression.
|
|
|
|
|
|
|
|
|
|
**Deep** parsers understand the complete syntax of Python. An example of this
|
|
|
|
|
is the auto-formatter Black [9]_. A particular requirement with these kinds of
|
|
|
|
|
tools is that they not only need to understand the syntax of the current version
|
|
|
|
|
of Python, but older versions of Python as well.
|
|
|
|
|
|
|
|
|
|
The ``match`` statement uses a soft keyword, and it is one of the first major
|
|
|
|
|
Python features to take advantage of the capabilities of the new PEG parser. This
|
|
|
|
|
means that third-party parsers which are not 'PEG-compatible' will have a hard
|
|
|
|
|
time with the new syntax.
|
|
|
|
|
|
|
|
|
|
It has been noted that a number of these third-party tools leverage common parsing
|
|
|
|
|
libraries (Black for example uses a fork of the lib2to3 parser). It may be helpful
|
|
|
|
|
to identify widely-used parsing libraries (such as parso [10]_ and libCST [11]_)
|
|
|
|
|
and upgrade them to be PEG compatible.
|
|
|
|
|
|
|
|
|
|
However, since this work would need to be done not only for the match statement,
|
2020-06-23 15:38:03 -04:00
|
|
|
|
but for *any* new Python syntax that leverages the capabilities of the PEG parser,
|
2020-06-23 11:27:36 -04:00
|
|
|
|
it is considered out of scope for this PEP. (Although it is suggested that this
|
|
|
|
|
would make a fine Summer of Code project.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reference Implementation
|
|
|
|
|
========================
|
|
|
|
|
|
|
|
|
|
A CPython implementation is
|
|
|
|
|
`currently under development <https://github.com/brandtbucher/cpython/tree/patma>`_,
|
|
|
|
|
and is almost entirely feature-complete.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example Code
|
|
|
|
|
============
|
|
|
|
|
|
|
|
|
|
A small collection of example code is
|
|
|
|
|
`available on GitHub <https://github.com/gvanrossum/patma/tree/master/examples>`_.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _rejected ideas:
|
|
|
|
|
|
|
|
|
|
Rejected Ideas
|
|
|
|
|
==============
|
|
|
|
|
|
2020-06-23 18:20:24 -04:00
|
|
|
|
This general idea has been floating around for a pretty long time, and many
|
2020-06-23 11:27:36 -04:00
|
|
|
|
back and forth decisions were made. Here we summarize many alternative
|
2020-06-23 18:20:24 -04:00
|
|
|
|
paths that were taken but eventually abandoned.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Don't do this, pattern matching is hard to learn
|
|
|
|
|
------------------------------------------------
|
|
|
|
|
|
|
|
|
|
In our opinion, the proposed pattern matching is not more difficult than
|
|
|
|
|
adding ``isinstance()`` and ``getattr()`` to iterable unpacking. Also, we
|
|
|
|
|
believe the proposed syntax significantly improves readability for a wide
|
|
|
|
|
range of code patterns, by allowing to express *what* one wants to do, rather
|
|
|
|
|
than *how* to do it. We hope the few real code snippets we included in the PEP
|
|
|
|
|
above illustrate this comparison well enough. For more real code examples
|
|
|
|
|
and their translations see Ref. [7]_.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Allow more flexible assignment targets instead
|
|
|
|
|
----------------------------------------------
|
|
|
|
|
|
|
|
|
|
There was an idea to instead just generalize the iterable unpacking to much
|
|
|
|
|
more general assignment targets, instead of adding a new kind of statement.
|
|
|
|
|
This concept is known in some other languages as "irrefutable matches". We
|
|
|
|
|
decided not to do this because inspection of real-life potential use cases
|
|
|
|
|
showed that in vast majority of cases destructuring is related to an ``if``
|
|
|
|
|
condition. Also many of those are grouped in a series of exclusive choices.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Make it an expression
|
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
|
|
In most other languages pattern matching is represented by an expression, not
|
|
|
|
|
statement. But making it an expression would be inconsistent with other
|
|
|
|
|
syntactic choices in Python. All decision making logic is expressed almost
|
|
|
|
|
exclusively in statements, so we decided to not deviate from this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use a hard keyword
|
|
|
|
|
------------------
|
|
|
|
|
|
|
|
|
|
There were options to make ``match`` a hard keyword, or choose a different
|
|
|
|
|
keyword. Although using a hard keyword would simplify life for simple-minded
|
|
|
|
|
syntax highlighters, we decided not to use hard keyword for several reasons:
|
|
|
|
|
|
|
|
|
|
* Most importantly, the new parser doesn't require us to do this. Unlike with
|
|
|
|
|
``async`` that caused hardships with being a soft keyword for few releases,
|
|
|
|
|
here we can make ``match`` a permanent soft keyword.
|
|
|
|
|
|
|
|
|
|
* ``match`` is so commonly used in existing code, that it would break almost
|
|
|
|
|
every existing program and will put a burden to fix code on many people who
|
|
|
|
|
may not even benefit from the new syntax.
|
|
|
|
|
|
|
|
|
|
* It is hard to find an alternative keyword that would not be commonly used
|
|
|
|
|
in existing programs as an identifier, and would still clearly reflect the
|
|
|
|
|
meaning of the statement.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use ``as`` or ``|`` instead of ``case`` for case clauses
|
|
|
|
|
--------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
The pattern matching proposed here is a combination of multi-branch control
|
|
|
|
|
flow (in line with ``switch`` in Algol-derived languages or ``cond`` in Lisp)
|
|
|
|
|
and object-deconstruction as found in functional languages. While the proposed
|
|
|
|
|
keyword ``case`` highlights the multi-branch aspect, alternative keywords such
|
|
|
|
|
as ``as`` would equally be possible, highlighting the deconstruction aspect.
|
|
|
|
|
``as`` or ``with``, for instance, also have the advantage of already being
|
|
|
|
|
keywords in Python. However, since ``case`` as a keyword can only occur as a
|
|
|
|
|
leading keyword inside a ``match`` statement, it is easy for a parser to
|
|
|
|
|
distinguish between its use as a keyword or as a variable.
|
|
|
|
|
|
|
|
|
|
Other variants would use a symbol like ``|`` or ``=>``, or go entirely without
|
|
|
|
|
special marker.
|
|
|
|
|
|
|
|
|
|
Since Python is a statement-oriented language in the tradition of Algol, and as
|
|
|
|
|
each composite statement starts with an identifying keyword, ``case`` seemed to
|
|
|
|
|
be most in line with Python's style and traditions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use a flat indentation scheme
|
|
|
|
|
-----------------------------
|
|
|
|
|
|
|
|
|
|
There was an idea to use an alternative indentation scheme, for example where
|
|
|
|
|
every case clause would not be indented with respect to the initial ``match``
|
|
|
|
|
part::
|
|
|
|
|
|
|
|
|
|
match expression:
|
|
|
|
|
case pattern_1:
|
|
|
|
|
...
|
|
|
|
|
case pattern_2:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
The motivation is that although flat indentation saves some horizontal space,
|
|
|
|
|
it may look awkward to an eye of a Python programmer, because everywhere else
|
|
|
|
|
colon is followed by an indent. This will also complicate life for
|
|
|
|
|
simple-minded code editors. Finally, the horizontal space issue can be
|
|
|
|
|
alleviated by allowing "half-indent" (i.e. two spaces instead of four) for
|
|
|
|
|
match statements.
|
|
|
|
|
|
|
|
|
|
In sample programs using `match`, written as part of the development of this
|
2020-06-23 15:21:43 -04:00
|
|
|
|
PEP, a noticeable improvement in code brevity is observed, more than making up
|
2020-06-23 11:27:36 -04:00
|
|
|
|
for the additional indentation level.
|
|
|
|
|
|
2020-06-25 00:26:17 -04:00
|
|
|
|
Another proposal considered was to use flat indentation but put the
|
|
|
|
|
expression on the line after ``match:``, like this::
|
|
|
|
|
|
|
|
|
|
match:
|
|
|
|
|
expression
|
|
|
|
|
case pattern_1:
|
|
|
|
|
...
|
|
|
|
|
case pattern_2:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
This was ultimately rejected because the first block would be a
|
|
|
|
|
novelty in Python's grammar: a block whose only content is a single
|
|
|
|
|
expression rather than a sequence of statements.
|
2020-06-23 19:28:16 -04:00
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
Alternatives for constant value pattern
|
|
|
|
|
---------------------------------------
|
|
|
|
|
|
|
|
|
|
This is probably the trickiest item. Matching against some pre-defined
|
|
|
|
|
constants is very common, but the dynamic nature of Python also makes it
|
2020-06-29 15:11:20 -04:00
|
|
|
|
ambiguous with capture patterns. Four other alternatives were considered:
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
* Use some implicit rules. For example if a name was defined in the global
|
2020-06-29 15:11:20 -04:00
|
|
|
|
scope, then it refers to a constant, rather than represents a capture pattern::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
FOO = 1
|
|
|
|
|
value = 0
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case FOO: # This would not be matched
|
|
|
|
|
...
|
|
|
|
|
case BAR: # This would be matched
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
This however can cause surprises and action at a distance if someone
|
|
|
|
|
defines an unrelated coinciding name before the match statement.
|
|
|
|
|
|
|
|
|
|
* Use a rule based on the case of a name. In particular, if the name
|
2020-06-29 15:11:20 -04:00
|
|
|
|
starts with a lowercase letter it would be a capture pattern, while if
|
2020-06-23 11:27:36 -04:00
|
|
|
|
it starts with uppercase it would refer to a constant::
|
|
|
|
|
|
|
|
|
|
FOO = 1
|
|
|
|
|
value = 0
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case FOO: # This would not be matched
|
|
|
|
|
...
|
|
|
|
|
case bar: # This would be matched
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
This works well with the recommendations for naming constants from
|
|
|
|
|
PEP 8. The main objection is that there's no other part of core
|
|
|
|
|
Python where the case of a name is semantically significant. (Then
|
|
|
|
|
again a leading dot in an expression has no precedent either -- its
|
|
|
|
|
use in ``import`` statements is quite different, since it resembles
|
|
|
|
|
the ``.`` used to denote the current directory in filesystems.)
|
|
|
|
|
|
|
|
|
|
* Use extra parentheses to indicate lookup semantics for a given name. For
|
|
|
|
|
example::
|
|
|
|
|
|
|
|
|
|
FOO = 1
|
|
|
|
|
value = 0
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case (FOO): # This would not be matched
|
|
|
|
|
...
|
|
|
|
|
case BAR: # This would be matched
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
This may be a viable option, but it can create some visual noise if used
|
|
|
|
|
often. Also honestly it looks pretty unusual, especially in nested contexts.
|
|
|
|
|
|
|
|
|
|
This also has the problem that we may want or need parentheses to
|
|
|
|
|
disambiguate grouping in patterns, e.g. in ``Point(x, y=(y :=
|
|
|
|
|
complex()))``.
|
|
|
|
|
|
|
|
|
|
* Introduce a special symbol, for example ``$`` or ``^`` to indicate that
|
|
|
|
|
a given name is a constant to be matched against, not to be assigned to::
|
|
|
|
|
|
|
|
|
|
FOO = 1
|
|
|
|
|
value = 0
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case $FOO: # This would not be matched
|
|
|
|
|
...
|
|
|
|
|
case BAR: # This would be matched
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
The problem with this approach is that introducing a new syntax for such
|
|
|
|
|
narrow use-case is probably an overkill.
|
|
|
|
|
|
|
|
|
|
* There was also on idea to make lookup semantics the default, and require
|
2020-06-29 15:11:20 -04:00
|
|
|
|
``$`` to be used in capture patterns::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
FOO = 1
|
|
|
|
|
value = 0
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case FOO: # This would not be matched
|
|
|
|
|
...
|
|
|
|
|
case $BAR: # This would be matched
|
|
|
|
|
...
|
|
|
|
|
|
2020-06-29 15:11:20 -04:00
|
|
|
|
But the capture patterns are more common in typical code, so having special
|
2020-06-23 11:27:36 -04:00
|
|
|
|
syntax for common case would be weird.
|
|
|
|
|
|
|
|
|
|
In the end, these alternatives were rejected because of the mentioned drawbacks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Disallow float literals in patterns
|
|
|
|
|
-----------------------------------
|
|
|
|
|
|
|
|
|
|
Because of the inexactness of floats, an early version of this proposal
|
|
|
|
|
did not allow floating-point constants to be used as match patterns. Part
|
|
|
|
|
of the justification for this prohibition is that Rust does this.
|
|
|
|
|
|
|
|
|
|
However, during implementation, it was discovered that distinguishing between
|
|
|
|
|
float values and other types required extra code in the VM that would slow
|
|
|
|
|
matches generally. Given that Python and Rust are very different languages
|
|
|
|
|
with different user bases and underlying philosophies, it was felt that
|
|
|
|
|
allowing float literals would not cause too much harm, and would be less
|
|
|
|
|
surprising to users.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Range matching patterns
|
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
|
|
This would allow patterns such as `1...6`. However, there are a host of
|
|
|
|
|
ambiguities:
|
|
|
|
|
|
|
|
|
|
* Is the range open, half-open, or closed? (I.e. is `6` included in the
|
|
|
|
|
above example or not?)
|
|
|
|
|
* Does the range match a single number, or a range object?
|
|
|
|
|
* Range matching is often used for character ranges ('a'...'z') but that
|
|
|
|
|
won't work in Python since there's no character data type, just strings.
|
|
|
|
|
* Range matching can be a significant performance optimization if you can
|
|
|
|
|
pre-build a jump table, but that's not generally possible in Python due
|
|
|
|
|
to the fact that names can be dynamically rebound.
|
|
|
|
|
|
|
|
|
|
Rather than creating a special-case syntax for ranges, it was decided
|
|
|
|
|
that allowing custom pattern objects (`InRange(0, 6)`) would be more flexible
|
|
|
|
|
and less ambiguous; however those ideas have been postponed for the time
|
|
|
|
|
being (See `deferred ideas`_).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use dispatch dict semantics for matches
|
|
|
|
|
---------------------------------------
|
|
|
|
|
|
|
|
|
|
Implementations for classic ``switch`` statement sometimes use a pre-computed
|
|
|
|
|
hash table instead of a chained equality comparisons to gain some performance.
|
|
|
|
|
In the context of ``match`` statement this is technically also possible for
|
|
|
|
|
matches against literal patterns. However, having subtly different semantics
|
|
|
|
|
for different kinds of patterns would be too surprising for potentially
|
|
|
|
|
modest performance win.
|
|
|
|
|
|
|
|
|
|
We can still experiment with possible performance optimizations in this
|
|
|
|
|
direction if they will not cause semantic differences.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use ``continue`` and ``break`` in case clauses.
|
|
|
|
|
-----------------------------------------------
|
|
|
|
|
|
|
|
|
|
Another rejected proposal was to define new meanings for ``continue``
|
|
|
|
|
and ``break`` inside of ``match``, which would have the following behavior:
|
|
|
|
|
|
|
|
|
|
* ``continue`` would exit the current case clause and continue matching
|
|
|
|
|
at the next case clause.
|
|
|
|
|
* ``break`` would exit the match statement.
|
|
|
|
|
|
|
|
|
|
However, there is a serious drawback to this proposal: if the ``match`` statement
|
|
|
|
|
is nested inside of a loop, the meanings of ``continue`` and ``break`` are now
|
|
|
|
|
changed. This may cause unexpected behavior during refactorings; also, an
|
|
|
|
|
argument can be made that there are other means to get the same behavior (such
|
|
|
|
|
as using guard conditions), and that in practice it's likely that the existing
|
|
|
|
|
behavior of ``continue`` and ``break`` are far more useful.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
AND (``&``) patterns
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
This proposal defines an OR-pattern (``|``) to match one of several alternates;
|
|
|
|
|
why not also an AND-pattern (``&``)? Especially given that some other languages
|
|
|
|
|
(F# for example) support this.
|
|
|
|
|
|
|
|
|
|
However, it's not clear how useful this would be. The semantics for matching
|
|
|
|
|
dictionaries, objects and sequences already incorporates an implicit 'and': all
|
|
|
|
|
attributes and elements mentioned must be present for the match to succeed. Guard
|
|
|
|
|
conditions can also support many of the use cases that a hypothetical 'and'
|
|
|
|
|
operator would be used for.
|
|
|
|
|
|
|
|
|
|
In the end, it was decided that this would make the syntax more complex without
|
|
|
|
|
adding a significant benefit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Negative match patterns
|
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
|
|
A negation of a match pattern using the operator ``!`` as a prefix would match
|
|
|
|
|
exactly if the pattern itself does not match. For instance, ``!(3 | 4)``
|
|
|
|
|
would match anything except ``3`` or ``4``.
|
|
|
|
|
|
|
|
|
|
This was rejected because there is documented evidence [8]_ that this feature
|
|
|
|
|
is rarely useful (in languages which support it) or used as double negation
|
|
|
|
|
``!!`` to control variable scopes and prevent variable bindings (which does
|
|
|
|
|
not apply to Python). It can also be simulated using guard conditions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Check exhaustiveness at runtime
|
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
|
|
The question is what to do if no case clause has a matching pattern, and
|
|
|
|
|
there is no default case. An earlier version of the proposal specified that
|
|
|
|
|
the behavior in this case would be to throw an exception rather than
|
|
|
|
|
silently falling through.
|
|
|
|
|
|
|
|
|
|
The arguments back and forth were many, but in the end the EIBTI (Explicit
|
|
|
|
|
Is Better Than Implicit) argument won out: it's better to have the programmer
|
|
|
|
|
explicitly throw an exception if that is the behavior they want.
|
|
|
|
|
|
|
|
|
|
For cases such as sealed classes and enums, where the patterns are all known
|
|
|
|
|
to be members of a discrete set, `static checkers`_ can warn about missing
|
|
|
|
|
patterns.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Type annotations for pattern variables
|
|
|
|
|
--------------------------------------
|
|
|
|
|
|
|
|
|
|
The proposal was to combine patterns with type annotations::
|
|
|
|
|
|
|
|
|
|
match x:
|
|
|
|
|
case [a: int, b: str]: print(f"An int {a} and a string {b}:)
|
|
|
|
|
case [a: int, b: int, c: int]: print(f"Three ints", a, b, c)
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
This idea has a lot of problems. For one, the colon can only
|
|
|
|
|
be used inside of brackets or parens, otherwise the syntax becomes
|
|
|
|
|
ambiguous. And because Python disallows ``isinstance()`` checks
|
|
|
|
|
on generic types, type annotations containing generics will not
|
|
|
|
|
work as expected.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Allow ``*rest`` in class patterns
|
|
|
|
|
---------------------------------
|
|
|
|
|
|
|
|
|
|
It was proposed to allow ``*rest`` in a class pattern, giving a
|
|
|
|
|
variable to be bound to all positional arguments at once (similar to
|
|
|
|
|
its use in unpacking assignments). It would provide some symmetry
|
|
|
|
|
with sequence patterns. But it might be confused with a feature to
|
|
|
|
|
provide the *values* for all positional arguments at once. And there
|
|
|
|
|
seems to be no practical need for it, so it was scrapped. (It could
|
|
|
|
|
easily be added at a later stage if a need arises.)
|
|
|
|
|
|
2020-06-24 10:46:54 -04:00
|
|
|
|
Disallow ``._`` and ``_.a`` in constant value patterns
|
|
|
|
|
------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
The first public draft said that the initial name in a constant value
|
|
|
|
|
pattern must not be ``_`` because ``_`` has a special meaning in
|
|
|
|
|
pattern matching, so these would be invalid::
|
|
|
|
|
|
|
|
|
|
case ._: ...
|
|
|
|
|
case _.a: ...
|
|
|
|
|
|
|
|
|
|
(However, ``a._`` would be legal and load the attribute with name
|
|
|
|
|
``_`` of the object ``a`` as usual.)
|
|
|
|
|
|
|
|
|
|
There was some pushback against this on python-dev (some people have a
|
|
|
|
|
legitimate use for ``_`` as an important global variable, esp. in
|
|
|
|
|
i18n) and the only reason for this prohibition was to prevent some
|
|
|
|
|
user confusion. But it's not the hill to die on.
|
|
|
|
|
|
2020-06-24 20:28:35 -04:00
|
|
|
|
Use some other token as wildcard
|
|
|
|
|
--------------------------------
|
|
|
|
|
|
|
|
|
|
It has been proposed to use ``...`` (i.e., the ellipsis token) or
|
|
|
|
|
``*`` (star) as a wildcard. However, both these look as if an
|
|
|
|
|
arbitrary number of items is omitted::
|
|
|
|
|
|
|
|
|
|
case [a, ..., z]: ...
|
|
|
|
|
case [a, *, z]: ...
|
|
|
|
|
|
|
|
|
|
Both look like the would match a sequence of at two or more items,
|
|
|
|
|
capturing the first and last values.
|
|
|
|
|
|
|
|
|
|
In addition, if ``*`` were to be used as the wildcard character, we
|
|
|
|
|
would have to come up with some other way to capture the rest of a
|
|
|
|
|
sequence, currently spelled like this::
|
|
|
|
|
|
|
|
|
|
case [first, second, *rest]: ...
|
|
|
|
|
|
|
|
|
|
Using an ellipsis would also be more confusing in documentation and
|
|
|
|
|
examples, where ``...`` is routinely used to indicate something
|
|
|
|
|
obvious or irrelevant. (Yes, this would also be an argument against
|
|
|
|
|
the other uses of ``...`` in Python, but that water is already under
|
|
|
|
|
the bridge.)
|
|
|
|
|
|
|
|
|
|
Another proposal was to use ``?``. This could be acceptable, although
|
2020-07-02 01:32:42 -04:00
|
|
|
|
it would require modifying the tokenizer.
|
|
|
|
|
|
|
|
|
|
Also, ``_`` is already used
|
2020-06-24 20:28:35 -04:00
|
|
|
|
as a throwaway target in other contexts, and this use is pretty
|
|
|
|
|
similar. This example is from ``difflib.py`` in the stdlib::
|
|
|
|
|
|
|
|
|
|
for tag, _, _, j1, j2 in group: ...
|
|
|
|
|
|
2020-07-02 01:32:42 -04:00
|
|
|
|
Perhaps the most convincing argument is that ``_`` is used as the
|
|
|
|
|
wildcard in every other language we've looked at supporting pattern
|
|
|
|
|
matching: C#, Elixir, Erlang, F#, Haskell, Mathematica, OCaml, Ruby,
|
|
|
|
|
Rust, Scala, and Swift. Now, in general, we should not be concerned
|
|
|
|
|
too much with what another language does, since Python is clearly
|
|
|
|
|
different from all these languages. However, if there is such an
|
|
|
|
|
overwhelming and strong consensus, Python should not go out of its way
|
|
|
|
|
to do something completely different -- particularly given that ``_``
|
|
|
|
|
works well in Python and is already in use as a throwaway target.
|
|
|
|
|
|
|
|
|
|
Note that ``_`` is not assigned to by patterns -- this avoids
|
|
|
|
|
conflicts with the use of ``_`` as a marker for translatable strings
|
|
|
|
|
and an alias for ``gettext.gettext``, as recommended by the
|
|
|
|
|
``gettext`` module documentation.
|
|
|
|
|
|
2020-07-02 01:11:53 -04:00
|
|
|
|
Use some other syntax instead of ``|`` for OR patterns
|
|
|
|
|
------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
A few alternatives to using ``|`` to separate the alternatives in OR
|
|
|
|
|
patterns have been proposed. Instead of::
|
|
|
|
|
|
|
|
|
|
case 401|403|404:
|
|
|
|
|
print("Some HTTP error")
|
|
|
|
|
|
|
|
|
|
the following proposals have been fielded:
|
|
|
|
|
|
|
|
|
|
- Use a comma::
|
|
|
|
|
|
|
|
|
|
case 401, 403, 404:
|
|
|
|
|
print("Some HTTP error")
|
|
|
|
|
|
|
|
|
|
This looks too much like a tuple -- we would have to find a
|
|
|
|
|
different way to spell tuples, and the construct would have to be
|
|
|
|
|
parenthesized inside the argument list of a class pattern. In
|
|
|
|
|
general, commas already have many different meanings in Python, we
|
|
|
|
|
shouldn't add more.
|
|
|
|
|
|
|
|
|
|
- Allow stacked cases::
|
|
|
|
|
|
|
|
|
|
case 401:
|
|
|
|
|
case 403:
|
|
|
|
|
case 404:
|
|
|
|
|
print("Some HTTP error")
|
|
|
|
|
|
|
|
|
|
This is how this would be done in C, using its fall-through
|
|
|
|
|
semantics for cases. However, we don't want to mislead people into
|
|
|
|
|
thinking that ``match``/``case`` uses fall-through semantics (which
|
|
|
|
|
are a common source of bugs in C). Also, this would be a novel
|
|
|
|
|
indentation pattern, which might make it harder to support in IDEs
|
|
|
|
|
and such (it would break the simple rule "add an indentation level
|
|
|
|
|
after a line ending in a colon"). Finally, this wouldn't support
|
|
|
|
|
OR patterns nested inside other patterns.
|
|
|
|
|
|
|
|
|
|
- Use ``case in`` followed by a comma-separated list::
|
|
|
|
|
|
|
|
|
|
case in 401, 403, 404:
|
|
|
|
|
print("Some HTTP error")
|
|
|
|
|
|
|
|
|
|
This wouldn't work for OR patterns nested inside other patterns,
|
|
|
|
|
like::
|
|
|
|
|
|
|
|
|
|
case Point(0|1, 0|1):
|
|
|
|
|
print("A corner of the unit square")
|
|
|
|
|
|
|
|
|
|
- Use the ``or`` keyword::
|
|
|
|
|
|
|
|
|
|
case 401 or 403 or 404:
|
|
|
|
|
print("Some HTTP error")
|
|
|
|
|
|
|
|
|
|
This could work, and the readability is not too different from using
|
|
|
|
|
``|``. Some users expressed a preference for ``or`` because they
|
|
|
|
|
associate ``|`` with bitwise OR. However:
|
|
|
|
|
|
|
|
|
|
1. Many other languages that have pattern matching use ``|`` (the
|
|
|
|
|
list includes Elixir, Erlang, F#, Mathematica, OCaml, Ruby, Rust,
|
|
|
|
|
and Scala).
|
|
|
|
|
2. ``|`` is shorter, which may contribute to the readability of
|
|
|
|
|
nested patterns like ``Point(0|1, 0|1)``.
|
|
|
|
|
3. Some people mistakenly believe that ``|`` has the wrong priority;
|
|
|
|
|
but since patterns don't support other operators it has the same
|
|
|
|
|
priority as in expressions.
|
|
|
|
|
4. Python users use ``or`` very frequently, and may build an
|
|
|
|
|
impression that it is strongly associated with Boolean
|
|
|
|
|
short-circuiting.
|
|
|
|
|
5. ``|`` is used between alternatives in regular expressions
|
|
|
|
|
and in EBNF grammars (like Python's own).
|
|
|
|
|
6. ``|`` not just used for bitwise OR -- it's used for set unions,
|
|
|
|
|
dict merging (:pep:`584`) and is being considered as an
|
|
|
|
|
alternative to ``typing.Union`` (:pep:`604`).
|
|
|
|
|
7. ``|`` works better as a visual separator, especially between
|
|
|
|
|
strings. Compare::
|
|
|
|
|
|
|
|
|
|
case "spam" or "eggs" or "cheese":
|
|
|
|
|
|
|
|
|
|
to::
|
|
|
|
|
|
|
|
|
|
case "spam" | "eggs" | "cheese":
|
|
|
|
|
|
2020-07-02 01:46:42 -04:00
|
|
|
|
Add an ``else`` clause
|
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
|
|
We decided not to add an ``else`` clause for several reasons.
|
|
|
|
|
|
|
|
|
|
- It is redundant, since we already have ``case _:``
|
|
|
|
|
|
|
|
|
|
- There will forever be confusion about the indentation level of the
|
|
|
|
|
``else:`` -- should it align with the list of cases or with the
|
|
|
|
|
``match`` keyword?
|
|
|
|
|
|
|
|
|
|
- Completionist arguments like "every other statement has one" are
|
|
|
|
|
false -- only those statements have an ``else`` clause where it adds
|
|
|
|
|
new functionality.
|
|
|
|
|
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
.. _deferred ideas:
|
|
|
|
|
|
|
|
|
|
Deferred Ideas
|
|
|
|
|
==============
|
|
|
|
|
|
|
|
|
|
There were a number of proposals to extend the matching syntax that we
|
|
|
|
|
decided to postpone for possible future PEP. These fall into the realm of
|
|
|
|
|
"cool idea but not essential", and it was felt that it might be better to
|
|
|
|
|
acquire some real-world data on how the match statement will be used in
|
|
|
|
|
practice before moving forward with some of these proposals.
|
|
|
|
|
|
|
|
|
|
Note that in each case, the idea was judged to be a "two-way door",
|
|
|
|
|
meaning that there should be no backwards-compatibility issues with adding
|
|
|
|
|
these features later.
|
|
|
|
|
|
|
|
|
|
One-off syntax variant
|
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
|
|
While inspecting some code-bases that may benefit the most from the proposed
|
|
|
|
|
syntax, it was found that single clause matches would be used relatively often,
|
|
|
|
|
mostly for various special-casing. In other languages this is supported in
|
|
|
|
|
the form of one-off matches. We proposed to support such one-off matches too::
|
|
|
|
|
|
|
|
|
|
if match value as pattern [and guard]:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
or, alternatively, without the ``if``::
|
|
|
|
|
|
|
|
|
|
match value as pattern [if guard]:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
as equivalent to the following expansion::
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case pattern [if guard]:
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
To illustrate how this will benefit readability, consider this (slightly
|
|
|
|
|
simplified) snippet from real code::
|
|
|
|
|
|
|
|
|
|
if isinstance(node, CallExpr):
|
|
|
|
|
if (isinstance(node.callee, NameExpr) and len(node.args) == 1 and
|
|
|
|
|
isinstance(node.args[0], NameExpr)):
|
|
|
|
|
call = node.callee.name
|
|
|
|
|
arg = node.args[0].name
|
|
|
|
|
... # Continue special-casing 'call' and 'arg'
|
|
|
|
|
... # Follow with common code
|
|
|
|
|
|
|
|
|
|
This can be rewritten in a more straightforward way as::
|
|
|
|
|
|
|
|
|
|
if match node as CallExpr(callee=NameExpr(name=call), args=[NameExpr(name=arg)]):
|
|
|
|
|
... # Continue special-casing 'call' and 'arg'
|
|
|
|
|
... # Follow with common code
|
|
|
|
|
|
|
|
|
|
This one-off form would not allow ``elif match`` statements, as it was only
|
|
|
|
|
meant to handle a single pattern case. It was intended to be special case
|
|
|
|
|
of a ``match`` statement, not a special case of an ``if`` statement::
|
|
|
|
|
|
|
|
|
|
if match value_1 as patter_1 [and guard_1]:
|
|
|
|
|
...
|
|
|
|
|
elif match value_2 as pattern_2 [and guard_2]: # Not allowed
|
|
|
|
|
...
|
|
|
|
|
elif match value_3 as pattern_3 [and guard_3]: # Not allowed
|
|
|
|
|
...
|
|
|
|
|
else: # Also not allowed
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
This would defeat the purpose of one-off matches as a complement to exhaustive
|
|
|
|
|
full matches - it's better and clearer to use a full match in this case.
|
|
|
|
|
|
|
|
|
|
Similarly, ``if not match`` would not be allowed, since ``match ... as ...`` is not
|
|
|
|
|
an expression. Nor do we propose a ``while match`` construct present in some languages
|
|
|
|
|
with pattern matching, since although it may be handy, it will likely be used
|
|
|
|
|
rarely.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Algebraic matching of repeated names
|
|
|
|
|
------------------------------------
|
|
|
|
|
|
2020-06-27 15:05:22 -04:00
|
|
|
|
A technique occasionally seen in functional languages like Erlang and Elixir is
|
2020-06-23 11:27:36 -04:00
|
|
|
|
to use a match variable multiple times in the same pattern::
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case Point(x, x):
|
|
|
|
|
print("Point is on a diagonal!")
|
|
|
|
|
|
|
|
|
|
The idea here is that the first appearance of ``x`` would bind the value
|
|
|
|
|
to the name, and subsequent occurrences would verify that the incoming
|
|
|
|
|
value was equal to the value previously bound. If the value was not equal,
|
|
|
|
|
the match would fail.
|
|
|
|
|
|
|
|
|
|
However, there are a number of subtleties involved with mixing load-store
|
2020-06-29 15:11:20 -04:00
|
|
|
|
semantics for capture patterns. For the moment, we decided to make repeated
|
2020-06-23 11:27:36 -04:00
|
|
|
|
use of names within the same pattern an error; we can always relax this
|
|
|
|
|
restriction later without affecting backwards compatibility.
|
|
|
|
|
|
|
|
|
|
Note that you **can** use the same name more than once in alternate choices::
|
|
|
|
|
|
|
|
|
|
match value:
|
|
|
|
|
case x | [x]:
|
|
|
|
|
# etc.
|
|
|
|
|
|
|
|
|
|
|
2020-06-28 23:30:08 -04:00
|
|
|
|
.. _extended matching:
|
|
|
|
|
|
2020-07-01 10:57:00 -04:00
|
|
|
|
Custom matching protocol
|
|
|
|
|
------------------------
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
During the initial design discussions for this PEP, there were a lot of ideas
|
2020-07-01 10:57:00 -04:00
|
|
|
|
thrown around about custom matchers. There were a couple of motivations for
|
|
|
|
|
this:
|
|
|
|
|
|
|
|
|
|
* Some classes might want to expose a different set of "matchable" names
|
|
|
|
|
than the actual class properties.
|
|
|
|
|
* Some classes might have properties that are expensive to calculate, and
|
|
|
|
|
therefore shouldn't be evaluated unless the match pattern actually needed
|
|
|
|
|
access to them.
|
|
|
|
|
* There were ideas for exotic matchers such as ``IsInstance()``,
|
|
|
|
|
``InRange()``, ``RegexMatchingGroup()`` and so on.
|
|
|
|
|
* In order for built-in types and standard library classes to be able
|
|
|
|
|
to support matching in a reasonable and intuitive way, it was believed
|
|
|
|
|
that these types would need to implement special matching logic.
|
|
|
|
|
|
|
|
|
|
These customized match behaviors would be controlled by a special
|
|
|
|
|
``__match__`` method on the class name. There were two competing variants:
|
|
|
|
|
|
|
|
|
|
* A 'full-featured' match protocol which would pass in not only
|
|
|
|
|
the target object to be matched, but detailed information about
|
|
|
|
|
which attributes the specified pattern was interested in.
|
|
|
|
|
* A simplified match protocol, which only passed in the target object,
|
|
|
|
|
and which returned a "proxy object" (which in most cases could be
|
|
|
|
|
just the target) containing the matchable attributes.
|
|
|
|
|
|
|
|
|
|
Here's an example of one version of the more complex protocol proposed::
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
match expr:
|
|
|
|
|
case BinaryOp(left=Number(value=x), op=op, right=Number(value=y)):
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
from types import PatternObject
|
|
|
|
|
BinaryOp.__match__(
|
|
|
|
|
(),
|
|
|
|
|
{
|
|
|
|
|
"left": PatternObject(Number, (), {"value": ...}, -1, False),
|
|
|
|
|
"op": ...,
|
|
|
|
|
"right": PatternObject(Number, (), {"value": ...}, -1, False),
|
|
|
|
|
},
|
|
|
|
|
-1,
|
|
|
|
|
False,
|
|
|
|
|
)
|
|
|
|
|
|
2020-07-01 10:57:00 -04:00
|
|
|
|
One drawback of this protocol is that the arguments to ``__match__``
|
|
|
|
|
would be expensive to construct, and could not be pre-computed due to
|
|
|
|
|
the fact that, because of the way names are bound, there are no real
|
|
|
|
|
constants in Python. It also meant that the ``__match__`` method would
|
|
|
|
|
have to re-implement much of the logic of matching which would otherwise
|
|
|
|
|
be implemented in C code in the Python VM. As a result, this option would
|
|
|
|
|
perform poorly compared to an equilvalent ``if``-statement.
|
|
|
|
|
|
|
|
|
|
The simpler protocol suffered from the fact that although it was more
|
|
|
|
|
performant, it was much less flexible, and did not allow for many of
|
|
|
|
|
the creative custom matchers that people were dreaming up.
|
|
|
|
|
|
|
|
|
|
Late in the design process, however, it was realized that the need for
|
|
|
|
|
a custom matching protocol was much less than anticipated. Virtually
|
|
|
|
|
all the realistic (as opposed to fanciful) uses cases brought up could
|
|
|
|
|
be handled by the built-in matching behavior, although in a few cases
|
|
|
|
|
an extra guard condition was required to get the desired effect.
|
|
|
|
|
|
|
|
|
|
Moreover, it turned out that none of the standard library classes really
|
|
|
|
|
needed any special matching support other than an appropriate
|
|
|
|
|
``__match_args__`` property.
|
|
|
|
|
|
|
|
|
|
The decision to postpone this feature came with a realization that this is
|
|
|
|
|
not a one-way door; that a more flexible and customizable matching protocol
|
|
|
|
|
can be added later, especially as we gain more experience with real-world
|
|
|
|
|
use cases and actual user needs.
|
|
|
|
|
|
|
|
|
|
The authors of this PEP expect that the ``match`` statement will evolve
|
|
|
|
|
over time as usage patterns and idioms evolve, in a way similar to what
|
|
|
|
|
other "multi-stage" PEPs have done in the past. When this happens, the
|
|
|
|
|
extended matching issue can be revisited.
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Parameterized Matching Syntax
|
|
|
|
|
-----------------------------
|
|
|
|
|
|
|
|
|
|
(Also known as "Class Instance Matchers".)
|
|
|
|
|
|
|
|
|
|
This is another variant of the "custom match classes" idea that would allow
|
|
|
|
|
diverse kinds of custom matchers mentioned in the previous section -- however,
|
|
|
|
|
instead of using an extended matching protocol, it would be achieved by
|
|
|
|
|
introducing an additional pattern type with its own syntax. This pattern type
|
|
|
|
|
would accept two distinct sets of parameters: one set which consists of the
|
|
|
|
|
actual parameters passed into the pattern object's constructor, and another
|
|
|
|
|
set representing the binding variables for the pattern.
|
|
|
|
|
|
|
|
|
|
The ``__match__`` method of these objects could use the constructor parameter
|
|
|
|
|
values in deciding what was a valid match.
|
|
|
|
|
|
|
|
|
|
This would allow patterns such as ``InRange<0, 6>(value)``, which would match
|
|
|
|
|
a number in the range 0..6 and assign the matched value to 'value'. Similarly,
|
2020-06-23 15:38:21 -04:00
|
|
|
|
one could have a pattern which tests for the existence of a named group in
|
2020-06-23 11:27:36 -04:00
|
|
|
|
a regular expression match result (different meaning of the word 'match').
|
|
|
|
|
|
|
|
|
|
Although there is some support for this idea, there was a lot of bikeshedding
|
|
|
|
|
on the syntax (there are not a lot of attractive options available)
|
|
|
|
|
and no clear consensus was reached, so it was decided that for now, this
|
|
|
|
|
feature is not essential to the PEP.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pattern Utility Library
|
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
|
|
Both of the previous ideas would be accompanied by a new Python standard
|
|
|
|
|
library module which would contain a rich set of exotic and useful matchers.
|
|
|
|
|
However, it it not really possible to implement such a library without
|
|
|
|
|
adopting one of the extended pattern proposals given in the previous sections,
|
|
|
|
|
so this idea is also deferred.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
.. [1]
|
|
|
|
|
https://en.wikipedia.org/wiki/Pattern_matching
|
|
|
|
|
|
|
|
|
|
.. [2]
|
|
|
|
|
https://en.wikipedia.org/wiki/Algebraic_data_type
|
|
|
|
|
|
|
|
|
|
.. [3]
|
|
|
|
|
https://doc.rust-lang.org/reference/patterns.html
|
|
|
|
|
|
|
|
|
|
.. [4]
|
|
|
|
|
https://docs.scala-lang.org/tour/pattern-matching.html
|
|
|
|
|
|
|
|
|
|
.. [5]
|
|
|
|
|
https://docs.python.org/3/library/dataclasses.html
|
|
|
|
|
|
|
|
|
|
.. [6]
|
|
|
|
|
https://docs.python.org/3/library/typing.html
|
|
|
|
|
|
|
|
|
|
.. [7]
|
|
|
|
|
https://github.com/gvanrossum/patma/blob/master/EXAMPLES.md
|
|
|
|
|
|
|
|
|
|
.. [8]
|
|
|
|
|
https://dl.acm.org/doi/abs/10.1145/2480360.2384582
|
|
|
|
|
|
|
|
|
|
.. [9]
|
|
|
|
|
https://black.readthedocs.io/en/stable/
|
|
|
|
|
|
|
|
|
|
.. [10]
|
|
|
|
|
https://github.com/davidhalter/parso
|
|
|
|
|
|
|
|
|
|
.. [11]
|
|
|
|
|
https://github.com/Instagram/LibCST
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. _Appendix A:
|
|
|
|
|
|
|
|
|
|
Appendix A -- Full Grammar
|
|
|
|
|
==========================
|
|
|
|
|
|
|
|
|
|
Here is the full grammar for ``match_stmt``. This is an additional
|
|
|
|
|
alternative for ``compound_stmt``. It should be understood that
|
|
|
|
|
``match`` and ``case`` are soft keywords, i.e. they are not reserved
|
|
|
|
|
words in other grammatical contexts (including at the start of a line
|
|
|
|
|
if there is no colon where expected). By convention, hard keywords
|
|
|
|
|
use single quotes while soft keywords use double quotes.
|
|
|
|
|
|
|
|
|
|
Other notation used beyond standard EBNF:
|
|
|
|
|
|
|
|
|
|
- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*``
|
|
|
|
|
- ``!RULE`` is a negative lookahead assertion
|
|
|
|
|
|
|
|
|
|
::
|
|
|
|
|
|
2020-06-25 23:52:27 -04:00
|
|
|
|
match_expr:
|
|
|
|
|
| star_named_expression ',' star_named_expressions?
|
|
|
|
|
| named_expression
|
|
|
|
|
match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT
|
2020-06-23 11:27:36 -04:00
|
|
|
|
case_block: "case" patterns [guard] ':' block
|
|
|
|
|
guard: 'if' named_expression
|
|
|
|
|
patterns: value_pattern ',' [values_pattern] | pattern
|
2020-06-30 23:26:10 -04:00
|
|
|
|
pattern: walrus_pattern | or_pattern
|
|
|
|
|
walrus_pattern: NAME ':=' or_pattern
|
2020-06-23 11:27:36 -04:00
|
|
|
|
or_pattern: '|'.closed_pattern+
|
|
|
|
|
closed_pattern:
|
2020-06-29 15:11:20 -04:00
|
|
|
|
| capture_pattern
|
2020-06-23 11:27:36 -04:00
|
|
|
|
| literal_pattern
|
|
|
|
|
| constant_pattern
|
|
|
|
|
| group_pattern
|
|
|
|
|
| sequence_pattern
|
|
|
|
|
| mapping_pattern
|
|
|
|
|
| class_pattern
|
2020-06-29 15:11:20 -04:00
|
|
|
|
capture_pattern: NAME !('.' | '(' | '=')
|
2020-06-23 11:27:36 -04:00
|
|
|
|
literal_pattern:
|
|
|
|
|
| signed_number !('+' | '-')
|
|
|
|
|
| signed_number '+' NUMBER
|
|
|
|
|
| signed_number '-' NUMBER
|
|
|
|
|
| strings
|
|
|
|
|
| 'None'
|
|
|
|
|
| 'True'
|
|
|
|
|
| 'False'
|
|
|
|
|
constant_pattern: '.' NAME !('.' | '(' | '=') | '.'? attr !('.' | '(' | '=')
|
|
|
|
|
group_pattern: '(' patterns ')'
|
|
|
|
|
sequence_pattern: '[' [values_pattern] ']' | '(' ')'
|
|
|
|
|
mapping_pattern: '{' items_pattern? '}'
|
|
|
|
|
class_pattern:
|
|
|
|
|
| name_or_attr '(' ')'
|
|
|
|
|
| name_or_attr '(' ','.pattern+ ','? ')'
|
|
|
|
|
| name_or_attr '(' ','.keyword_pattern+ ','? ')'
|
|
|
|
|
| name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'
|
|
|
|
|
signed_number: NUMBER | '-' NUMBER
|
|
|
|
|
attr: name_or_attr '.' NAME
|
|
|
|
|
name_or_attr: attr | NAME
|
|
|
|
|
values_pattern: ','.value_pattern+ ','?
|
|
|
|
|
items_pattern: ','.key_value_pattern+ ','?
|
|
|
|
|
keyword_pattern: NAME '=' or_pattern
|
2020-06-29 15:11:20 -04:00
|
|
|
|
value_pattern: '*' capture_pattern | pattern
|
2020-06-23 11:27:36 -04:00
|
|
|
|
key_value_pattern:
|
|
|
|
|
| (literal_pattern | constant_pattern) ':' or_pattern
|
2020-06-29 15:11:20 -04:00
|
|
|
|
| '**' capture_pattern
|
2020-06-23 11:27:36 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document is placed in the public domain or under the
|
|
|
|
|
CC0-1.0-Universal license, whichever is more permissive.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|