python-peps/pep-0622.rst

1793 lines
64 KiB
ReStructuredText
Raw Normal View History

PEP: 622
Title: Structural Pattern Matching
Version: $Revision$
Last-Modified: $Date$
Author: Brandt Bucher <brandtbucher@gmail.com>,
Tobias Kohn <kohnt@tobiaskohn.ch>,
Ivan Levkivskyi <levkivskyi@gmail.com>,
Guido van Rossum <guido@python.org>,
Talin <viridia@gmail.com>
BDFL-Delegate:
Discussions-To: Python-Dev <python-dev@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 23-Jun-2020
Python-Version: 3.10
Post-History: 23-Jun-2020
Resolution:
Abstract
========
This PEP proposes adding pattern matching statements [1]_ to Python in
order to create more expressive ways of handling structured
heterogeneous data. The authors take a holistic approach, providing
both static and runtime specifications.
:pep:`275` and :pep:`3103` previously proposed similar constructs, and
were rejected. Instead of targeting the optimization of
``if ... elif ... else`` statements (as those PEPs did), this design
focuses on generalizing sequence, mapping, and object destructuring.
It uses syntactic features made possible by :pep:`617`, which
introduced a more powerful method of parsing Python source code.
Rationale and Goals
===================
Let us start from some anecdotal evidence: ``isinstance()`` is one of the most
called functions in large scale Python code-bases (by static call count).
In particular, when analyzing some multi-million line production code base,
it was discovered that ``isinstance()`` is the second most called builtin
function (after ``len()``). Even taking into account builtin classes, it is
still in the top ten. Most of such calls are followed by specific attribute
access.
There are two possible conclusions that can be drawn from this information:
* Handling of heterogeneous data (i.e. situations where a variable can take
values of multiple types) is common in real world code.
* Python doesn't have expressive ways of destructuring object data (i.e.
separating the content of an object into multiple variables).
This is in contrast with the opposite sides of both aspects:
* Its success in the numeric world indicates that Python is good when
working with homogeneous data. It also has builtin support for homogeneous
data structures such as e.g. lists and arrays, and semantic constructs such
as iterators and generators.
* Python is expressive and flexible at constructing objects. It has syntactic
support for collection literals and comprehensions. Custom objects can be
created using positional and keyword calls that are customized by special
``__init__()`` method.
This PEP aims at improving the support for destructuring heterogeneous data
by adding a dedicated syntactic support for it in the form of pattern matching.
On a very high level it is similar to regular expressions, but instead of
matching strings, it will be possible to match arbitrary Python objects.
We believe this will improve both readability and reliability of relevant code.
To illustrate the readability improvement, let us consider an actual example
from the Python standard library::
def is_tuple(node):
if isinstance(node, Node) and node.children == [LParen(), RParen()]:
return True
return (isinstance(node, Node)
and len(node.children) == 3
and isinstance(node.children[0], Leaf)
and isinstance(node.children[1], Node)
and isinstance(node.children[2], Leaf)
and node.children[0].value == "("
and node.children[2].value == ")")
With the syntax proposed in this PEP it can be rewritten as below. Note that
the proposed code will work without any modifications to the definition of
``Node`` and other classes here::
def is_tuple(node: Node) -> bool:
match node:
case Node(children=[LParen(), RParen()]):
return True
case Node(children=[Leaf(value="("), Node(), Leaf(value=")")]):
return True
case _:
return False
See the `syntax`_ sections below for a more detailed specification.
Similarly to how constructing objects can be customized by a user-defined
``__init__()`` method, we propose that destructuring objects can be customized
by a new special ``__match__()`` method. As part of this PEP we specify the
general ``__match__()`` API, its implementation for ``object.__match__()``,
and for some standard library classes (including PEP 557 dataclasses). See
`runtime`_ section below.
Finally, we aim to provide a comprehensive support for static type checkers
and similar tools. For this purpose we propose to introduce a
``@typing.sealed`` class decorator that will be a no-op at runtime, but
will indicate to static tools that all subclasses of this class must be defined
in the same module. This will allow effective static exhaustiveness checks,
and together with dataclasses, will provide a nice support for algebraic data
types [2]_. See the `static checkers`_ section for more details.
In general, we believe that pattern matching has been proved to be a useful and
expressive tool in various modern languages. In particular, many aspects of
this PEP were inspired by how pattern matching works in Rust [3]_ and
Scala [4]_.
.. _syntax:
Syntax and Semantics
====================
Case clauses
------------
A simplified, approximate grammar for the proposed syntax is::
...
compound_statement:
| if_stmt
...
| match_stmt
match_stmt: "match" expression ':' NEWLINE INDENT case_block+ DEDENT
case_block: "case" pattern [guard] ':' block
guard: 'if' expression
pattern: NAME ':=' or_pattern | or_pattern
or_pattern: closed_pattern ('|' closed_pattern)*
closed_pattern:
| literal_pattern
| name_pattern
| constant_pattern
| sequence_pattern
| mapping_pattern
| class_pattern
(See `Appendix A`_ for the full, unabridged grammar.)
We propose the match syntax to be a statement, not an expression. Although in
many languages it is an expression, being a statement better suits the general
logic of Python syntax. See `rejected ideas`_ for more discussion. The list of
allowed patterns is specified below in the `patterns`_ subsection.
The ``match`` and ``case`` keywords are proposed to be soft keywords,
so that they are recognized as keywords at the beginning of a match
statement or case block respectively, but are allowed to be used in
other places as variable or argument names.
The proposed indentation structure is as following::
match some_expression:
case pattern_1:
...
case pattern_2:
...
Match semantics
---------------
The proposed large scale semantics for choosing the match is to choose the first
matching pattern and execute the corresponding suite. The remaining patterns
2020-06-24 01:15:47 -04:00
are not tried. If there are no matching patterns, the statement 'falls
through', and execution continues at the following statement.
Essentially this is equivalent to a chain of ``if ... elif ... else``
statements. Note that unlike for the previously proposed ``switch`` statement,
the pre-computed dispatch dictionary semantics does not apply here.
There is no ``default`` or ``else`` case - instead the special wildcard
``_`` can be used (see the section on `name_pattern`_) as a final
'catch-all' pattern.
Name bindings made during a successful pattern match outlive the executed suite
and can be used after the match statement. This follows the logic of other
Python statements that can bind names, such as ``for`` loop and ``with``
statement. For example::
match shape:
case Point(x, y):
...
case Rectangle(x, y, _, _):
...
print(x, y) # This works
.. _patterns:
Allowed patterns
----------------
We introduce the proposed syntax gradually. Here we start from the main
building blocks. The following patterns are supported:
.. _literal_pattern:
Literal Pattern
~~~~~~~~~~~~~~~
A literal pattern consists of a simple literal like a string, a number,
a Boolean literal (``True`` or ``False``), or ``None``::
match number:
case 0:
print("Nothing")
case 1:
print("Just one")
case 2:
print("A couple")
case -1:
print("One less than nothing")
case 1-1j:
print("Good luck with that...")
Literal pattern uses equality with literal on the right hand side, so that
in the above example ``number == 1`` and then possibly ``number == 2`` will
be evaluated. Note that although technically negative numbers
are represented using unary minus, they are considered
literals for the purpose of pattern matching. Unary plus is not allowed.
Binary plus and minus are allowed only to join a real number and an imaginary
number to form a complex number, such as ``1+1j``.
Note that because equality (``__eq__``) is used, and the equivalency
between Booleans and the integers ``0`` and ``1``, there is no
practical difference between the following two::
case True:
...
case 1:
...
Triple-quoted strings are supported. Raw strings and byte strings
are supported. F-strings are not allowed (since in general they are not
really literals).
.. _name_pattern:
Name Pattern
~~~~~~~~~~~~
A name pattern serves as an assignment target for the matched expression::
match greeting:
case "":
print("Hello!")
case name:
print(f"Hi {name}!")
A name pattern always succeeds. A name pattern appearing in a scope makes
the name local to that scope. For example, using ``name`` after the above
snippet may raise ``UnboundLocalError`` rather than ``NameError``, if
the ``""`` case clause was taken::
match greeting:
case "":
print("Hello!")
case name:
print(f"Hi {name}!")
if name == "Santa": # <-- might raise UnboundLocalError
... # but works fine if greeting was not empty
While matching against each case clause, a name may be bound at most
once, having two name patterns with coinciding names is an error. An
exception is made for the special single underscore (``_``) name; in
patterns, it's a wildcard that *never* binds::
match data:
case [x, x]: # Error!
...
case [_, _]:
print("Some pair")
print(_) # Error!
Note: one can still match on a collection with equal items using `guards`_.
Also, ``[x, y] | Point(x, y)`` is a legal pattern because the two
alternatives are never matched at the same time.
Reminder: ``None``, ``False`` and ``True`` are keywords denoting
literals, not names.
.. _constant_value_pattern:
Constant Value Pattern
~~~~~~~~~~~~~~~~~~~~~~
This is used to match against constants and enum values.
Every dotted name in a pattern is looked up using normal Python name
resolution rules, and the value is used for comparison by equality with
the matching expression (same as for literals). As a special case to avoid
ambiguity with name patterns, simple names must be prefixed with a dot to be
considered a reference::
from enum import Enum
class Color(Enum):
BLACK = 1
RED = 2
BLACK = 1
RED = 2
match color:
case .BLACK | Color.BLACK:
print("Black suits every color")
case BLACK: # This will just assign a new value to BLACK.
...
The leading dot can be omitted if the name is already dotted, but
adding it is not prohibited, so ``.Color.BLACK`` is the same as ``Color.BLACK``.
See `rejected ideas`_ for other syntactic alternatives that were considered
for constant value pattern.
.. _sequence_pattern:
Sequence Pattern
~~~~~~~~~~~~~~~~
A sequence pattern follows the same semantics as unpacking assignment.
Like unpacking assignment, both tuple-like and list-like syntax can be
used, with identical semantics. Each element can be an arbitrary
pattern; there may also be at most one ``*name`` pattern to catch all
remaining items::
match collection:
case 1, [x, *others]:
print("Got 1 and a nested sequence")
case (1, x):
print(f"Got 1 and {x}")
To match a sequence pattern the target must be an instance of
``collections.abc.Sequence``, and it cannot be any kind of string
(``str``, ``bytes``, ``bytearray``). It cannot be an iterator. For matching
on a specific collection class, see class pattern below.
The ``_`` wildcard can be starred to match sequences of varying lengths. For
example:
* ``[*_]`` matches a sequence of any length.
* ``(_, _, *_)``, matches any sequence of length two or more.
* ``["a", *_, "z"]`` matches any sequence of length two or more that starts with
``"a"`` and ends with ``"z"``.
.. _mapping_pattern:
Mapping Pattern
~~~~~~~~~~~~~~~
Mapping pattern is a generalization of iterable unpacking to mappings.
Its syntax is similar to dictionary display but each key and value are
patterns ``"{" (pattern ":" pattern)+ "}"``. A ``**name`` pattern is also
allowed, to extract the remaining items. Only literal and constant value
patterns are allowed in key positions::
import constants
match config:
case {"route": route}:
process_route(route)
case {constants.DEFAULT_PORT: sub_config, **rest}:
process_config(sub_config, rest)
The target must be an instance of ``collections.abc.Mapping``.
Extra keys in the target are ignored even if ``**rest`` is not present.
This is different from sequence pattern, where extra items will cause a
match to fail. But mappings are actually different from sequences: they
have natural structural sub-typing behavior, i.e., passing a dictionary
with extra keys somewhere will likely just work.
For this reason, ``**_`` is invalid in mapping patterns; it would always be a
no-op that could be removed without consequence.
Matched key-value pairs must already be present in the mapping, and not created
on-the-fly by ``__missing__`` or ``__getitem__``. For example,
``collections.defaultdict`` instances will only match patterns with keys that
were already present when the ``match`` block was entered.
.. _class_pattern:
Class Pattern
~~~~~~~~~~~~~
A class pattern provides support for destructuring arbitrary objects.
There are two possible ways of matching on object attributes: by position
like ``Point(1, 2)``, and by name like ``Point(x=1, y=2)``. These
two can be combined, but positional match cannot follow a match by name.
Each item in a class pattern can be an arbitrary pattern. A simple
example::
match shape:
case Point(x, y):
...
case Rectangle(x0, y0, x1, y1, painted=True):
...
Whether a match succeeds or not is determined by calling a special
``__match__()`` method on the class named in the pattern
(``Point`` and ``Rectangle`` in the example),
with the value being matched (``shape``) as the only argument.
If the method returns ``None``, the match fails, otherwise the
match continues w.r.t. attributes of the returned proxy object, see details
in `runtime`_ section.
The named class must inherit from ``type``. It may be a single name
or a dotted name (e.g. ``some_mod.SomeClass`` or ``mod.pkg.Class``).
The leading name must not be ``_``, so e.g. ``_(...)`` and
``_.C(...)`` are invalid. Use ``object(foo=_)`` to check whether the
matched object has an attribute ``foo``.
This PEP only fully specifies the behavior of ``__match__()`` for ``object``
and some builtin and standard library classes, custom classes are only
required to follow the protocol specified in `runtime`_ section. After all,
the authors of a class know best how to "revert" the logic of the
``__init__()`` they wrote. The runtime will then chain these calls to allow
matching against arbitrarily nested patterns.
Combining multiple patterns
---------------------------
Multiple alternative patterns can be combined into one using ``|``. This means
2020-06-24 20:25:13 -04:00
the whole pattern matches if at least one alternative matches.
Alternatives are tried from left to right and have short-circuit property,
subsequent patterns are not tried if one matched. Examples::
match something:
case 0 | 1 | 2:
print("Small number")
case [] | [_]:
print("A short sequence")
case str() | bytes():
print("Something string-like")
case _:
print("Something else")
The alternatives may bind variables, as long as each alternative binds
the same set of variables (excluding ``_``). For example::
match something:
case 1 | x: # Error!
...
case x | 1: # Error!
...
case one := [1] | two := [2]: # Error!
...
case Foo(arg=x) | Bar(arg=x): # Valid, both arms bind 'x'
...
case [x] | x: # Valid, both arms bind 'x'
...
.. _guards:
Guards
------
Each *top-level* pattern can be followed by a guard of the form
``if expression``. A case clause succeeds if the pattern matches and the guard
evaluates to a true value. For example::
match input:
case [x, y] if x > MAX_INT and y > MAX_INT:
print("Got a pair of large numbers")
case x if x > MAX_INT:
print("Got a large number")
case [x, y] if x == y:
print("Got equal items")
case _:
print("Not an outstanding input")
If evaluating a guard raises an exception, it is propagated onwards rather
than fail the case clause. Names that appear in a pattern are bound before the
guard succeeds. So this will work::
values = [0]
match values:
case [x] if x:
... # This is not executed
case _:
...
print(x) # This will print "0"
Note that guards are not allowed for nested patterns, so that ``[x if x > 0]``
is a ``SyntaxError`` and ``1 | 2 if 3 | 4`` will be parsed as
``(1 | 2) if (3 | 4)``.
.. _named:
Named sub-patterns
------------------
It is often useful to match a sub-pattern *and* to bind the corresponding
value to a name. For example, it can be useful to write more efficient
matches, or simply to avoid repetition. To simplify such cases, a name pattern
can be combined with another arbitrary pattern using named sub-patterns of
the form ``name := pattern``. For example::
match get_shape():
case Line(start := Point(x, y), end) if start == end:
print(f"Zero length line at {x}, {y}")
Note that the name pattern used in the named sub-pattern can be used in
the match suite, or after the match statement. However, the name will
*only* be bound if the sub-pattern succeeds. Another example::
match group_shapes():
case [], [point := Point(x, y), *other]:
print(f"Got {point} in the second group")
process_coordinates(x, y)
...
Technically, most such examples can be rewritten using guards and/or nested
match statements, but this will be less readable and/or will produce less
efficient code. Essentially, most of the arguments in PEP 572 apply here
equally.
``_`` is not a valid name here.
.. _runtime:
Runtime specification
=====================
The ``__match__()`` protocol
----------------------------
TODO: Show equivalent pseudo code.
The ``__match__()`` method is used to decide whether an object matches
a given class pattern and to extract the corresponding attributes. It
must be a class method or a static method returning an object
(typically the same as the argument), or ``None`` to indicate that no
match is possible. (More about the return value in the next section.)
The procedure is as following:
* The class object for ``Class`` in ``Class(<sub-patterns>)`` is looked up and
``Class.__match__(obj)`` is called where ``obj`` is the value being matched.
* If the result of the call (which we are referring to as "match proxy") is
``None``, the match fails.
* Otherwise, if any sub-patterns are given in the form of positional
or keyword arguments, these are matched from left to right, as
follows. The match fails as soon as a sub-pattern fails; if all
sub-patterns succeed, the overall class pattern match succeeds.
* If there are match-by-position items and the class has a
``__match_args__``, the item at position ``i``
is matched against the value looked up by attribute
``__match_args__[i]``. For example, a pattern ``Point2D(5, 8)``,
where ``Point2D.__match_args__ == ["x", "y"]``, is translated
(approximately) into ``obj.x == 5 and obj.y == 8``.
* If there are more positional items than the length of ``__match_args__``, an
``ImpossibleMatchError`` is raised.
* If the ``__match_args__`` attribute is absent on the matched class,
but more than one positional item appears in a match,
``ImpossibleMatchError`` is also raised. We don't fall back on
using ``__slots__`` or ``__annotations__`` -- "In the face of ambiguity,
refuse the temptation to guess."
* If there are any match-by-keyword items the keywords are looked up
as attributes on the proxy. If the lookup succeeds the value is
matched against the corresponding sub-pattern. If the lookup fails,
two cases are distinguished:
* If an attribute is missing on the proxy and the class being matched
has no ``__match_args__`` attribute, the match
fails. This allows one to write ``case object(name=_)`` to
implement a check for the presence of a given attribute, or ``case
object(name=var)`` to check for its presence and extract its value.
* If an attribute is missing and the class has a ``__match_args__``,
the match fails if the attribute name is in
``__match_args__``, else the match raises ``ImpossibleMatchError``.
Such a protocol favors simplicity of implementation over flexibility and
performance. For other considered alternatives, see `rejected ideas`_.
For the most commonly-matched built-in types (``bool``,
``bytearray``, ``bytes``, ``dict``, ``float``,
``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``), a
single positional sub-pattern is allowed to be passed to
the call. Rather than being matched against any particular attribute
on the proxy, it is instead matched against the proxy itself. This
creates behavior that is useful and intuitive for these objects:
* ``bool(False)`` matches ``False`` (but not ``0``).
* ``tuple((0, 1, 2))`` matches ``(0, 1, 2)`` (but not ``[0, 1, 2]``).
* ``int(i)`` matches any ``int`` and binds it to the name ``i``.
Result value of ``__match__()``
-------------------------------
If a match is successful, the ``__match__()`` method should return an object
whose attribute values will then be bound to the corresponding keyword argument
names in the pattern after the match is complete. For each possible name that is
legal in the match pattern, the returned object should have a corresponding attribute
with that name, that can be used to access that value.
(Positional sub-patterns are matched to keyword sub-patterns using
``__match_args__`` as shown in the previous section.)
For most ordinary objects, this returned object can simply be the original object,
unchanged.
However, there may be cases where the internal implementation of a class is
very different than its public representation, for example a ``Point`` class with
`x`, `y` and `z` attributes may be represented internally as a vector; in such cases
a 'proxy object' may be returned whose attributes correspond to the matchable names.
There is no requirement that the attributes on the proxy object be the same type or
value as the attributes of the original object; one envisioned use case is for
expensive-to-compute properties to be computed lazily on the proxy object via
property getters.
In deciding what names should be available for matching, the recommended practice
is that class patterns should be the mirror of construction; that is, the set of
available names and their types should resemble the arguments to ``__init__()``.
Ambiguous matches
-----------------
Impossible and ambiguous matches are detected at runtime and a special
exception ``ImpossibleMatchError`` (proposed to be a subclass of ``TypeError``)
will be raised. In addition to basic checks described in the previous
subsection:
* The interpreter will check that two match items are not targeting the same
attribute, for example ``Point2D(1, 2, y=3)`` is an error.
Special attribute ``__match_args__``
------------------------------------
The ``__match_args__`` attribute complements the ``__match__`` method and is
always looked up on the same class as the ``__match__`` method.
``__match_args__``, if it is present, must be a list or
tuple of strings naming the allowed positional arguments.
Default ``object.__match__()``
------------------------------
The default implementation aims at providing a basic, useful (but still safe)
experience with pattern matching out of the box. For this purpose the default
``__match__()`` method follows this logic (pseudo-code)::
class object:
@classmethod
def __match__(cls, instance):
if isinstance(instance, cls):
return instance
This means that pattern matching is allowed by default for every class. If
a class wants to disallow pattern matching against itself, it should define
``__match__ = None``. This will cause an exception when trying to match
against such a class.
The above implementation means that by default only match-by-name will
work,
and classes should define ``__match_args__`` (e.g. as a class
attribute) if they would like to support match-by-position. Additionally,
dataclasses and named tuples will support match-by-position out of the box. See below for more
details.
Finally, all attributes are exposed for matching, if a class wants to hide
some attributes from matching against them, a custom ``__match__()`` method is
required.
The standard library
--------------------
To facilitate the use of pattern matching, several changes will be made to
the standard library:
* Namedtuples and dataclasses will have auto-generated ``__match_args__``.
* For dataclasses the order of attributes in the generated ``__match_args__``
will be the same as the order of corresponding arguments in the generated
``__init__()`` method. This includes the situations where attributes are
inherited from a superclass.
In addition, a systematic effort will be put into going through existing
standard library classes and adding custom ``__match__()`` and/or
``__match_args__`` where it looks beneficial.
.. _static checkers:
Static checkers specification
=============================
Exhaustiveness checks
---------------------
From a reliability perspective, experience shows that missing a case when
dealing with a set of possible data values leads to hard to debug issues,
thus forcing people to add safety asserts like this::
def get_first(data: Union[int, list[int]]) -> int:
if isinstance(data, list) and data:
return data[0]
elif isinstance(data, int):
return data
else:
assert False, "should never get here"
PEP 484 specifies that static type checkers should support exhaustiveness in
conditional checks with respect to enum values. PEP 586 later generalized this
requirement to literal types.
This PEP further generalizes this requirement to
arbitrary patterns. A typical situation where this applies is matching an
expression with a union type::
def classify(val: Union[int, Tuple[int, int], List[int]]) -> str:
match val:
case [x, *other]:
return f"A sequence starting with {x}"
case [x, y] if x > 0 and y > 0:
return f"A pair of {x} and {y}"
case int():
return f"Some integer"
# Type-checking error: some cases unhandled.
The exhaustiveness checks should also apply where both pattern matching
and enum values are combined::
from enum import Enum
from typing import Union
class Level(Enum):
BASIC = 1
ADVANCED = 2
PRO = 3
class User:
name: str
level: Level
class Admin:
name: str
account: Union[User, Admin]
match account:
case Admin(name=name) | User(name=name, level=Level.PRO):
...
case User(level=Level.ADVANCED):
...
# Type-checking error: basic user unhandled
Obviously, no ``Matchable`` protocol (in terms of PEP 544) is needed, since
every class is matchable and therefore is subject to the checks specified
above.
Sealed classes as algebraic data types
--------------------------------------
Quite often it is desirable to apply exhaustiveness to a set of classes without
defining ad-hoc union types, which is itself fragile if a class is missing in
the union definition. A design pattern where a group of record-like classes is
combined into a union is popular in other languages that support pattern
matching and is known under a name of algebraic data types [2]_.
We propose to add a special decorator class ``@sealed`` to the ``typing``
module [6]_, that will have no effect at runtime, but will indicate to static
type checkers that all subclasses (direct and indirect) of this class should
be defined in the same module as the base class.
The idea is that since all subclasses are known, the type checker can treat
the sealed base class as a union of all its subclasses. Together with
dataclasses this allows a clean and safe support of algebraic data types
in Python. Consider this example::
from dataclasses import dataclass
from typing import sealed
@sealed
class Node:
...
class Expression(Node):
...
class Statement(Node):
...
@dataclass
class Name(Expression):
name: str
@dataclass
class Operation(Expression):
left: Expression
op: str
right: Expression
@dataclass
class Assignment(Statement):
target: str
value: Expression
@dataclass
class Print(Statement):
value: Expression
With such definition, a type checker can safely treat ``Node`` as
``Union[Name, Operation, Assignment, Print]``, and also safely treat e.g.
``Expression`` as ``Union[Name, Operation]``. So this will result in a type
checking error in the below snippet, because ``Name`` is not handled (and type
checker can give a useful error message)::
def dump(node: Node) -> str:
match node:
case Assignment(target, value):
return f"{target} = {dump(value)}"
case Print(value):
return f"print({dump(value)})"
case Operation(left, op, right):
return f"({dump(left)} {op} {dump(right)})"
Type erasure
------------
Class patterns are subject to runtime type erasure. Namely, although one
can define a type alias ``IntQueue = Queue[int]`` so that a pattern like
``IntQueue()`` is syntactically valid, type checkers should reject such a
match::
queue: Union[Queue[int], Queue[str]]
match queue:
case IntQueue(): # Type-checking error here
...
Note that the above snippet actually fails at runtime with the current
implementation of generic classes in the ``typing`` module, as well as
with builtin generic classes in the recently accepted PEP 585, because
they prohibit ``isinstance`` checks.
To clarify, generic classes are not prohibited in general from participating
in pattern matching, just that their type parameters can't be explicitly
specified. It is still fine if sub-patterns or literals bind the type
variables. For example::
from typing import Generic, TypeVar, Union
T = TypeVar('T')
class Result(Generic[T]):
first: T
other: list[T]
result: Union[Result[int], Result[str]]
match result:
case Result(first=int()):
... # Type of result is Result[int] here
case Result(other=["foo", "bar", *rest]):
... # Type of result is Result[str] here
Note about constants
--------------------
The fact that name pattern is always an assignment target may create unwanted
consequences when a user by mistake tries to "match" a value against
a constant instead of using the constant value pattern. As a result, at
runtime such match will always succeed and moreover override the value of
the constant. It is important therefore that static type checkers warn about
such situations. For example::
from typing import Final
MAX_INT: Final = 2 ** 64
value = 0
match value:
case MAX_INT: # Type-checking error here: cannot assign to final name
print("Got big number")
case .MAX_INT: # This is OK
print("Got big number")
case _:
print("Something else")
Precise type checking of star matches
-------------------------------------
Type checkers should perform precise type checking of star items in pattern
matching giving them either a heterogeneous ``list[T]`` type, or
a ``TypedDict`` type as specified by PEP 589. For example::
stuff: Tuple[int, str, str, float]
match stuff:
case a, *b, 0.5:
# Here a is int and b is list[str]
...
Performance Considerations
==========================
Ideally, a ``match`` statement should have good runtime performance compared
to an equivalent chain of if-statements. Although the history of programming
languages is rife with examples of new features which increased engineer
productivity at the expense of additional CPU cycles, it would be
unfortunate if the benefits of ``match`` were counter-balanced by a significant
overall decrease in runtime performance.
That being said, because of the flexibility of ``match``, and the fact that
it can be customized via the ``__match__`` callback, there is some overhead
involved with calling these methods. Exactly how much cost this will entail
will be implementation-dependent.
In this design, an attempt has been made to avoid putting too much of a
computational burden on the ``__match__`` method. In particular, earlier
versions of the design required a custom matcher to completely re-implement
most of the pattern-matching logic that would have been performed by the VM.
The current design eschews this flexibility in favor of a simpler, faster
custom match protocol.
Although this PEP does not specify any particular implementation strategy,
a few words about the prototype implementation and how it attempts to
maximize performance are in order.
Basically, the prototype implementation transforms all of the ``match``
statement syntax into equivalent if/else blocks - or more accurately, into
Python byte codes that have the same effect. In other words, all of the
logic for testing instance types, sequence lengths, mapping keys and
so on are inlined in place of the ``match``.
This is not the only possible strategy, nor is it necessarily the best.
For example, the call to ``__match__`` could be memoized, especially
if there are multiple instances of the same class type but with different
arguments in a single match statement. It is also theoretically
possible for a future implementation to process the case clauses in
parallel using a decision tree rather than testing them one by one.
For this reason, implementers of ``__match__`` should not make any
assumptions about the number of times or the order in which ``__match__``
is called.
Backwards Compatibility
=======================
This PEP is fully backwards compatible: the ``match`` and ``case``
keywords are proposed to be (and stay!) soft keywords, so their use as
variable, function, class, module or attribute names is not impeded at
all.
This is important because ``match`` is the name of a popular and
well-known function and method in the ``re`` module, which we have no
desire to break or deprecate.
The difference between hard and soft keywords is that hard keywords
are *always* reserved words, even in positions where they make no
sense (e.g. ``x = class + 1``), while soft keywords only get a special
meaning in context. Since our parser backtracks, that means that on
different attempts to parse a code fragment it could interpret a soft
keyword differently.
For example, suppose the parser encounters the following input::
match [x, y]:
The parser first attempts to parse this as an expression statement.
It interprets ``match`` as a NAME token, and then considers ``[x,
y]`` to be a double subscript. It then encounters the colon and has
to backtrack, since an expression statement cannot be followed by a
colon. The parser then backtracks to the start of the line and finds
that ``match`` is a soft keyword allowed in this position. It then
considers ``[x, y]`` to be a list expression. The colon then is just
what the parser expected, and the parse succeeds.
Impacts on third-party tools
============================
There are a lot of tools in the Python ecosystem that operate on Python
source code: linters, syntax highlighters, auto-formatters, and IDEs. These
will all need to be updated to include awareness of the ``match`` statement.
In general, these tools fall into one of two categories:
**Shallow** parsers don't try to understand the full syntax of Python, but
instead scan the source code for specific known patterns. IDEs, such as Visual
Studio Code, Emacs and TextMate, tend to fall in this category, since frequently
the source code is invalid while being edited, and a strict approach to parsing
would fail.
For these kinds of tools, adding knowledge of a new keyword is relatively
easy, just an addition to a table, or perhaps modification of a regular
expression.
**Deep** parsers understand the complete syntax of Python. An example of this
is the auto-formatter Black [9]_. A particular requirement with these kinds of
tools is that they not only need to understand the syntax of the current version
of Python, but older versions of Python as well.
The ``match`` statement uses a soft keyword, and it is one of the first major
Python features to take advantage of the capabilities of the new PEG parser. This
means that third-party parsers which are not 'PEG-compatible' will have a hard
time with the new syntax.
It has been noted that a number of these third-party tools leverage common parsing
libraries (Black for example uses a fork of the lib2to3 parser). It may be helpful
to identify widely-used parsing libraries (such as parso [10]_ and libCST [11]_)
and upgrade them to be PEG compatible.
However, since this work would need to be done not only for the match statement,
2020-06-23 15:38:03 -04:00
but for *any* new Python syntax that leverages the capabilities of the PEG parser,
it is considered out of scope for this PEP. (Although it is suggested that this
would make a fine Summer of Code project.)
Reference Implementation
========================
A CPython implementation is
`currently under development <https://github.com/brandtbucher/cpython/tree/patma>`_,
and is almost entirely feature-complete.
Example Code
============
A small collection of example code is
`available on GitHub <https://github.com/gvanrossum/patma/tree/master/examples>`_.
.. _rejected ideas:
Rejected Ideas
==============
This general idea has been floating around for a pretty long time, and many
back and forth decisions were made. Here we summarize many alternative
paths that were taken but eventually abandoned.
Don't do this, pattern matching is hard to learn
------------------------------------------------
In our opinion, the proposed pattern matching is not more difficult than
adding ``isinstance()`` and ``getattr()`` to iterable unpacking. Also, we
believe the proposed syntax significantly improves readability for a wide
range of code patterns, by allowing to express *what* one wants to do, rather
than *how* to do it. We hope the few real code snippets we included in the PEP
above illustrate this comparison well enough. For more real code examples
and their translations see Ref. [7]_.
Allow more flexible assignment targets instead
----------------------------------------------
There was an idea to instead just generalize the iterable unpacking to much
more general assignment targets, instead of adding a new kind of statement.
This concept is known in some other languages as "irrefutable matches". We
decided not to do this because inspection of real-life potential use cases
showed that in vast majority of cases destructuring is related to an ``if``
condition. Also many of those are grouped in a series of exclusive choices.
Make it an expression
---------------------
In most other languages pattern matching is represented by an expression, not
statement. But making it an expression would be inconsistent with other
syntactic choices in Python. All decision making logic is expressed almost
exclusively in statements, so we decided to not deviate from this.
Use a hard keyword
------------------
There were options to make ``match`` a hard keyword, or choose a different
keyword. Although using a hard keyword would simplify life for simple-minded
syntax highlighters, we decided not to use hard keyword for several reasons:
* Most importantly, the new parser doesn't require us to do this. Unlike with
``async`` that caused hardships with being a soft keyword for few releases,
here we can make ``match`` a permanent soft keyword.
* ``match`` is so commonly used in existing code, that it would break almost
every existing program and will put a burden to fix code on many people who
may not even benefit from the new syntax.
* It is hard to find an alternative keyword that would not be commonly used
in existing programs as an identifier, and would still clearly reflect the
meaning of the statement.
Use ``as`` or ``|`` instead of ``case`` for case clauses
--------------------------------------------------------
The pattern matching proposed here is a combination of multi-branch control
flow (in line with ``switch`` in Algol-derived languages or ``cond`` in Lisp)
and object-deconstruction as found in functional languages. While the proposed
keyword ``case`` highlights the multi-branch aspect, alternative keywords such
as ``as`` would equally be possible, highlighting the deconstruction aspect.
``as`` or ``with``, for instance, also have the advantage of already being
keywords in Python. However, since ``case`` as a keyword can only occur as a
leading keyword inside a ``match`` statement, it is easy for a parser to
distinguish between its use as a keyword or as a variable.
Other variants would use a symbol like ``|`` or ``=>``, or go entirely without
special marker.
Since Python is a statement-oriented language in the tradition of Algol, and as
each composite statement starts with an identifying keyword, ``case`` seemed to
be most in line with Python's style and traditions.
Use a flat indentation scheme
-----------------------------
There was an idea to use an alternative indentation scheme, for example where
every case clause would not be indented with respect to the initial ``match``
part::
match expression:
case pattern_1:
...
case pattern_2:
...
The motivation is that although flat indentation saves some horizontal space,
it may look awkward to an eye of a Python programmer, because everywhere else
colon is followed by an indent. This will also complicate life for
simple-minded code editors. Finally, the horizontal space issue can be
alleviated by allowing "half-indent" (i.e. two spaces instead of four) for
match statements.
In sample programs using `match`, written as part of the development of this
2020-06-23 15:21:43 -04:00
PEP, a noticeable improvement in code brevity is observed, more than making up
for the additional indentation level.
Another proposal considered was to use flat indentation but put the
expression on the line after ``match:``, like this::
match:
expression
case pattern_1:
...
case pattern_2:
...
This was ultimately rejected because the first block would be a
novelty in Python's grammar: a block whose only content is a single
expression rather than a sequence of statements.
Alternatives for constant value pattern
---------------------------------------
This is probably the trickiest item. Matching against some pre-defined
constants is very common, but the dynamic nature of Python also makes it
ambiguous with name patterns. Four other alternatives were considered:
* Use some implicit rules. For example if a name was defined in the global
scope, then it refers to a constant, rather than represents a name pattern::
FOO = 1
value = 0
match value:
case FOO: # This would not be matched
...
case BAR: # This would be matched
...
This however can cause surprises and action at a distance if someone
defines an unrelated coinciding name before the match statement.
* Use a rule based on the case of a name. In particular, if the name
starts with a lowercase letter it would be a name pattern, while if
it starts with uppercase it would refer to a constant::
FOO = 1
value = 0
match value:
case FOO: # This would not be matched
...
case bar: # This would be matched
...
This works well with the recommendations for naming constants from
PEP 8. The main objection is that there's no other part of core
Python where the case of a name is semantically significant. (Then
again a leading dot in an expression has no precedent either -- its
use in ``import`` statements is quite different, since it resembles
the ``.`` used to denote the current directory in filesystems.)
* Use extra parentheses to indicate lookup semantics for a given name. For
example::
FOO = 1
value = 0
match value:
case (FOO): # This would not be matched
...
case BAR: # This would be matched
...
This may be a viable option, but it can create some visual noise if used
often. Also honestly it looks pretty unusual, especially in nested contexts.
This also has the problem that we may want or need parentheses to
disambiguate grouping in patterns, e.g. in ``Point(x, y=(y :=
complex()))``.
* Introduce a special symbol, for example ``$`` or ``^`` to indicate that
a given name is a constant to be matched against, not to be assigned to::
FOO = 1
value = 0
match value:
case $FOO: # This would not be matched
...
case BAR: # This would be matched
...
The problem with this approach is that introducing a new syntax for such
narrow use-case is probably an overkill.
* There was also on idea to make lookup semantics the default, and require
``$`` to be used in name patterns::
FOO = 1
value = 0
match value:
case FOO: # This would not be matched
...
case $BAR: # This would be matched
...
But the name patterns are more common in typical code, so having special
syntax for common case would be weird.
In the end, these alternatives were rejected because of the mentioned drawbacks.
Disallow float literals in patterns
-----------------------------------
Because of the inexactness of floats, an early version of this proposal
did not allow floating-point constants to be used as match patterns. Part
of the justification for this prohibition is that Rust does this.
However, during implementation, it was discovered that distinguishing between
float values and other types required extra code in the VM that would slow
matches generally. Given that Python and Rust are very different languages
with different user bases and underlying philosophies, it was felt that
allowing float literals would not cause too much harm, and would be less
surprising to users.
Range matching patterns
-----------------------
This would allow patterns such as `1...6`. However, there are a host of
ambiguities:
* Is the range open, half-open, or closed? (I.e. is `6` included in the
above example or not?)
* Does the range match a single number, or a range object?
* Range matching is often used for character ranges ('a'...'z') but that
won't work in Python since there's no character data type, just strings.
* Range matching can be a significant performance optimization if you can
pre-build a jump table, but that's not generally possible in Python due
to the fact that names can be dynamically rebound.
Rather than creating a special-case syntax for ranges, it was decided
that allowing custom pattern objects (`InRange(0, 6)`) would be more flexible
and less ambiguous; however those ideas have been postponed for the time
being (See `deferred ideas`_).
Use dispatch dict semantics for matches
---------------------------------------
Implementations for classic ``switch`` statement sometimes use a pre-computed
hash table instead of a chained equality comparisons to gain some performance.
In the context of ``match`` statement this is technically also possible for
matches against literal patterns. However, having subtly different semantics
for different kinds of patterns would be too surprising for potentially
modest performance win.
We can still experiment with possible performance optimizations in this
direction if they will not cause semantic differences.
Use ``continue`` and ``break`` in case clauses.
-----------------------------------------------
Another rejected proposal was to define new meanings for ``continue``
and ``break`` inside of ``match``, which would have the following behavior:
* ``continue`` would exit the current case clause and continue matching
at the next case clause.
* ``break`` would exit the match statement.
However, there is a serious drawback to this proposal: if the ``match`` statement
is nested inside of a loop, the meanings of ``continue`` and ``break`` are now
changed. This may cause unexpected behavior during refactorings; also, an
argument can be made that there are other means to get the same behavior (such
as using guard conditions), and that in practice it's likely that the existing
behavior of ``continue`` and ``break`` are far more useful.
AND (``&``) patterns
--------------------
This proposal defines an OR-pattern (``|``) to match one of several alternates;
why not also an AND-pattern (``&``)? Especially given that some other languages
(F# for example) support this.
However, it's not clear how useful this would be. The semantics for matching
dictionaries, objects and sequences already incorporates an implicit 'and': all
attributes and elements mentioned must be present for the match to succeed. Guard
conditions can also support many of the use cases that a hypothetical 'and'
operator would be used for.
In the end, it was decided that this would make the syntax more complex without
adding a significant benefit.
Negative match patterns
-----------------------
A negation of a match pattern using the operator ``!`` as a prefix would match
exactly if the pattern itself does not match. For instance, ``!(3 | 4)``
would match anything except ``3`` or ``4``.
This was rejected because there is documented evidence [8]_ that this feature
is rarely useful (in languages which support it) or used as double negation
``!!`` to control variable scopes and prevent variable bindings (which does
not apply to Python). It can also be simulated using guard conditions.
Check exhaustiveness at runtime
-------------------------------
The question is what to do if no case clause has a matching pattern, and
there is no default case. An earlier version of the proposal specified that
the behavior in this case would be to throw an exception rather than
silently falling through.
The arguments back and forth were many, but in the end the EIBTI (Explicit
Is Better Than Implicit) argument won out: it's better to have the programmer
explicitly throw an exception if that is the behavior they want.
For cases such as sealed classes and enums, where the patterns are all known
to be members of a discrete set, `static checkers`_ can warn about missing
patterns.
Type annotations for pattern variables
--------------------------------------
The proposal was to combine patterns with type annotations::
match x:
case [a: int, b: str]: print(f"An int {a} and a string {b}:)
case [a: int, b: int, c: int]: print(f"Three ints", a, b, c)
...
This idea has a lot of problems. For one, the colon can only
be used inside of brackets or parens, otherwise the syntax becomes
ambiguous. And because Python disallows ``isinstance()`` checks
on generic types, type annotations containing generics will not
work as expected.
Allow ``*rest`` in class patterns
---------------------------------
It was proposed to allow ``*rest`` in a class pattern, giving a
variable to be bound to all positional arguments at once (similar to
its use in unpacking assignments). It would provide some symmetry
with sequence patterns. But it might be confused with a feature to
provide the *values* for all positional arguments at once. And there
seems to be no practical need for it, so it was scrapped. (It could
easily be added at a later stage if a need arises.)
2020-06-24 10:46:54 -04:00
Disallow ``._`` and ``_.a`` in constant value patterns
------------------------------------------------------
The first public draft said that the initial name in a constant value
pattern must not be ``_`` because ``_`` has a special meaning in
pattern matching, so these would be invalid::
case ._: ...
case _.a: ...
(However, ``a._`` would be legal and load the attribute with name
``_`` of the object ``a`` as usual.)
There was some pushback against this on python-dev (some people have a
legitimate use for ``_`` as an important global variable, esp. in
i18n) and the only reason for this prohibition was to prevent some
user confusion. But it's not the hill to die on.
Use some other token as wildcard
--------------------------------
It has been proposed to use ``...`` (i.e., the ellipsis token) or
``*`` (star) as a wildcard. However, both these look as if an
arbitrary number of items is omitted::
case [a, ..., z]: ...
case [a, *, z]: ...
Both look like the would match a sequence of at two or more items,
capturing the first and last values.
In addition, if ``*`` were to be used as the wildcard character, we
would have to come up with some other way to capture the rest of a
sequence, currently spelled like this::
case [first, second, *rest]: ...
Using an ellipsis would also be more confusing in documentation and
examples, where ``...`` is routinely used to indicate something
obvious or irrelevant. (Yes, this would also be an argument against
the other uses of ``...`` in Python, but that water is already under
the bridge.)
Another proposal was to use ``?``. This could be acceptable, although
it would require modifying the tokenizer. But ``_`` is already used
as a throwaway target in other contexts, and this use is pretty
similar. This example is from ``difflib.py`` in the stdlib::
for tag, _, _, j1, j2 in group: ...
.. _deferred ideas:
Deferred Ideas
==============
There were a number of proposals to extend the matching syntax that we
decided to postpone for possible future PEP. These fall into the realm of
"cool idea but not essential", and it was felt that it might be better to
acquire some real-world data on how the match statement will be used in
practice before moving forward with some of these proposals.
Note that in each case, the idea was judged to be a "two-way door",
meaning that there should be no backwards-compatibility issues with adding
these features later.
One-off syntax variant
----------------------
While inspecting some code-bases that may benefit the most from the proposed
syntax, it was found that single clause matches would be used relatively often,
mostly for various special-casing. In other languages this is supported in
the form of one-off matches. We proposed to support such one-off matches too::
if match value as pattern [and guard]:
...
or, alternatively, without the ``if``::
match value as pattern [if guard]:
...
as equivalent to the following expansion::
match value:
case pattern [if guard]:
...
To illustrate how this will benefit readability, consider this (slightly
simplified) snippet from real code::
if isinstance(node, CallExpr):
if (isinstance(node.callee, NameExpr) and len(node.args) == 1 and
isinstance(node.args[0], NameExpr)):
call = node.callee.name
arg = node.args[0].name
... # Continue special-casing 'call' and 'arg'
... # Follow with common code
This can be rewritten in a more straightforward way as::
if match node as CallExpr(callee=NameExpr(name=call), args=[NameExpr(name=arg)]):
... # Continue special-casing 'call' and 'arg'
... # Follow with common code
This one-off form would not allow ``elif match`` statements, as it was only
meant to handle a single pattern case. It was intended to be special case
of a ``match`` statement, not a special case of an ``if`` statement::
if match value_1 as patter_1 [and guard_1]:
...
elif match value_2 as pattern_2 [and guard_2]: # Not allowed
...
elif match value_3 as pattern_3 [and guard_3]: # Not allowed
...
else: # Also not allowed
...
This would defeat the purpose of one-off matches as a complement to exhaustive
full matches - it's better and clearer to use a full match in this case.
Similarly, ``if not match`` would not be allowed, since ``match ... as ...`` is not
an expression. Nor do we propose a ``while match`` construct present in some languages
with pattern matching, since although it may be handy, it will likely be used
rarely.
Algebraic matching of repeated names
------------------------------------
A technique occasionally seen in functional languages like Erlang and Elixir is
to use a match variable multiple times in the same pattern::
match value:
case Point(x, x):
print("Point is on a diagonal!")
The idea here is that the first appearance of ``x`` would bind the value
to the name, and subsequent occurrences would verify that the incoming
value was equal to the value previously bound. If the value was not equal,
the match would fail.
However, there are a number of subtleties involved with mixing load-store
semantics for name patterns. For the moment, we decided to make repeated
use of names within the same pattern an error; we can always relax this
restriction later without affecting backwards compatibility.
Note that you **can** use the same name more than once in alternate choices::
match value:
case x | [x]:
# etc.
Extended matching protocol
--------------------------
During the initial design discussions for this PEP, there were a lot of ideas
thrown around about exotic custom matchers: ``IsInstance()``, ``InRange()``,
``RegexMatchingGroup()`` and so on. In fact, part of the proposal included
a new Python standard library module containing a menagerie of such diverse
matchers.
However, these matchers require a much more flexible and expensive custom
matching protocol. In particular, it meant that the ``__match__`` method
would need to have an additional "match signature" argument which would
let it know exactly what values the pattern was seeking.
Part of the argument against this more flexible protocol was that this
match signature argument would be expensive to construct. Due to the dynamic
nature of Python name binding, it could not be a constant, but would have
to be created anew each time; and there is no guarantee that the ``__match__``
function would even use this argument in its internal logic.
The decision to postpone this feature came with a realization that this is
not a one-way door; that an extended matching protocol could be added later,
using a variety of techniques (such as defining a new custom match magic
method with a different name) to signal that a class wished to opt-in
in the extended protocol and that the VM should compute the extended signature
object.
The authors of this PEP expect that the ``match`` statement will evolve
over time as usage patterns and idioms evolve, in a way similar to what
other "multi-stage" PEPs have done in the past. When this happens, the
extended matching issue can be revisited.
There was an idea to send partial context like literals only, or
custom pattern objects that will provide the full context. For example
the below match would generate the following call::
match expr:
case BinaryOp(left=Number(value=x), op=op, right=Number(value=y)):
...
from types import PatternObject
BinaryOp.__match__(
(),
{
"left": PatternObject(Number, (), {"value": ...}, -1, False),
"op": ...,
"right": PatternObject(Number, (), {"value": ...}, -1, False),
},
-1,
False,
)
This would allow faster ``__match__()`` implementations and give better
support for customization in user-defined classes. There is however a big
downside to this: it would make the basic implementation of this method quite
complicated. Also, there would be a performance penalty if the user did not
treat the pattern object properly.
Parameterized Matching Syntax
-----------------------------
(Also known as "Class Instance Matchers".)
This is another variant of the "custom match classes" idea that would allow
diverse kinds of custom matchers mentioned in the previous section -- however,
instead of using an extended matching protocol, it would be achieved by
introducing an additional pattern type with its own syntax. This pattern type
would accept two distinct sets of parameters: one set which consists of the
actual parameters passed into the pattern object's constructor, and another
set representing the binding variables for the pattern.
The ``__match__`` method of these objects could use the constructor parameter
values in deciding what was a valid match.
This would allow patterns such as ``InRange<0, 6>(value)``, which would match
a number in the range 0..6 and assign the matched value to 'value'. Similarly,
2020-06-23 15:38:21 -04:00
one could have a pattern which tests for the existence of a named group in
a regular expression match result (different meaning of the word 'match').
Although there is some support for this idea, there was a lot of bikeshedding
on the syntax (there are not a lot of attractive options available)
and no clear consensus was reached, so it was decided that for now, this
feature is not essential to the PEP.
Pattern Utility Library
-----------------------
Both of the previous ideas would be accompanied by a new Python standard
library module which would contain a rich set of exotic and useful matchers.
However, it it not really possible to implement such a library without
adopting one of the extended pattern proposals given in the previous sections,
so this idea is also deferred.
References
==========
.. [1]
https://en.wikipedia.org/wiki/Pattern_matching
.. [2]
https://en.wikipedia.org/wiki/Algebraic_data_type
.. [3]
https://doc.rust-lang.org/reference/patterns.html
.. [4]
https://docs.scala-lang.org/tour/pattern-matching.html
.. [5]
https://docs.python.org/3/library/dataclasses.html
.. [6]
https://docs.python.org/3/library/typing.html
.. [7]
https://github.com/gvanrossum/patma/blob/master/EXAMPLES.md
.. [8]
https://dl.acm.org/doi/abs/10.1145/2480360.2384582
.. [9]
https://black.readthedocs.io/en/stable/
.. [10]
https://github.com/davidhalter/parso
.. [11]
https://github.com/Instagram/LibCST
.. _Appendix A:
Appendix A -- Full Grammar
==========================
Here is the full grammar for ``match_stmt``. This is an additional
alternative for ``compound_stmt``. It should be understood that
``match`` and ``case`` are soft keywords, i.e. they are not reserved
words in other grammatical contexts (including at the start of a line
if there is no colon where expected). By convention, hard keywords
use single quotes while soft keywords use double quotes.
Other notation used beyond standard EBNF:
- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*``
- ``!RULE`` is a negative lookahead assertion
::
match_expr:
| star_named_expression ',' star_named_expressions?
| named_expression
match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT
case_block: "case" patterns [guard] ':' block
guard: 'if' named_expression
patterns: value_pattern ',' [values_pattern] | pattern
pattern: NAME ':=' or_pattern | or_pattern
or_pattern: '|'.closed_pattern+
closed_pattern:
| name_pattern
| literal_pattern
| constant_pattern
| group_pattern
| sequence_pattern
| mapping_pattern
| class_pattern
name_pattern: NAME !('.' | '(' | '=')
literal_pattern:
| signed_number !('+' | '-')
| signed_number '+' NUMBER
| signed_number '-' NUMBER
| strings
| 'None'
| 'True'
| 'False'
constant_pattern: '.' NAME !('.' | '(' | '=') | '.'? attr !('.' | '(' | '=')
group_pattern: '(' patterns ')'
sequence_pattern: '[' [values_pattern] ']' | '(' ')'
mapping_pattern: '{' items_pattern? '}'
class_pattern:
| name_or_attr '(' ')'
| name_or_attr '(' ','.pattern+ ','? ')'
| name_or_attr '(' ','.keyword_pattern+ ','? ')'
| name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'
signed_number: NUMBER | '-' NUMBER
attr: name_or_attr '.' NAME
name_or_attr: attr | NAME
values_pattern: ','.value_pattern+ ','?
items_pattern: ','.key_value_pattern+ ','?
keyword_pattern: NAME '=' or_pattern
value_pattern: '*' name_pattern | pattern
key_value_pattern:
| (literal_pattern | constant_pattern) ':' or_pattern
| '**' name_pattern
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: