PEP 634, PEP 635, PEP 636: Split PEP 622 in three parts (#1598)

This commit is contained in:
Guido van Rossum 2020-10-01 17:04:25 -07:00 committed by GitHub
parent 30eab3e5b4
commit 0c87b5a018
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 2017 additions and 1 deletions

View File

@ -10,12 +10,13 @@ Author: Brandt Bucher <brandtbucher@gmail.com>,
Talin <viridia@gmail.com>
BDFL-Delegate:
Discussions-To: Python-Dev <python-dev@python.org>
Status: Draft
Status: Superseded
Type: Standards Track
Content-Type: text/x-rst
Created: 23-Jun-2020
Python-Version: 3.10
Post-History: 23-Jun-2020, 8-Jul-2020
Superseded-By: 634
Resolution:

653
pep-0634.rst Normal file
View File

@ -0,0 +1,653 @@
PEP: 634
Title: Structural Pattern Matching: Specification
Version: $Revision$
Last-Modified: $Date$
Author: Brandt Bucher <brandtbucher@gmail.com>,
Guido van Rossum <guido@python.org>
BDFL-Delegate:
Discussions-To: Python-Dev <python-dev@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 12-Sep-2020
Python-Version: 3.10
Post-History:
Replaces: 622
Resolution:
Abstract
========
**NOTE:** This draft is incomplete and not intended for review yet.
We're checking it into the peps repo for the convenience of the authors.
This PEP provides the technical specification for the ``match``
statement. It replaces PEP 622, which is hereby split in three parts:
- PEP 634: Specification
- PEP 635: Motivation and Rationale
- PEP 636: Tutorial
This PEP is intentionally devoid of commentary; all explanations of
design choices are in PEP 635. First-time readers are encouraged to
start with PEP 636, which provides a gentler introduction to the
concepts, syntax and semantics of patterns.
TODO: Maybe we should add simple examples back to each section?
There's no rule saying a spec can't include examples, and currently
it's *very* dry.
TODO: Go over the feedback from the SC and make sure everything's
somehow incorporated (either here or in PEP 635, which has to answer
why we didn't budge on most of the SC's initial requests).
Syntax and Semantics
====================
See `Appendix A`_ for the complete grammar.
Overview and terminology
------------------------
The pattern matching process takes as input a pattern (following
``case``) and a subject value (following ``match``). Phrases to
describe the process include "the pattern is matched with (or against)
the subject value" and "we match the pattern against (or with) the
subject value".
The primary outcome of pattern matching is success or failure. In
case of success we may say "the pattern succeeds", "the match
succeeds", or "the pattern matches the subject value".
In many cases a pattern contains subpatterns, and success or failure
is determined by the success or failure of matching those subpatterns
against the value (e.g., for OR patterns) or against parts of the
value (e.g., for sequence patterns). This process typically processes
the subpatterns from left to right until the overall outcome is
determined. E.g., an OR pattern succeeds at the first succeeding
subpattern, while a sequence patterns fails at the first failing
subpattern.
A secondary outcome of pattern matching may be one or more name
bindings. We may say "the pattern binds a value to a name". When
subpatterns tried until the first success, only the bindings due to
the successful subpattern are valid; when trying until the first
failure, the bindings are merged. Several more rules, explained
below, apply to these cases.
The ``match`` statement
-----------------------
Syntax::
match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT
match_expr:
| star_named_expression ',' star_named_expressions?
| named_expression
case_block: "case" patterns [guard] ':' block
guard: 'if' named_expression
The rules ``star_named_expression``, ``star_named_expressions``,
``named_expression`` and ``block`` are part of the `standard Python
grammar <https://docs.python.org/3.10/reference/grammar.html>`_.
The rule ``patterns`` is specified below.
For context, ``match_stmt`` is a new alternative for
``compound_statement``::
compound_statement:
| if_stmt
...
| match_stmt
The ``match`` and ``case`` keywords are soft keywords, i.e. they are
not reserved words in other grammatical contexts (including at the
start of a line if there is no colon where expected). This implies
that they are recognized as keywords when part of a ``match``
statement or ``case`` block only, and are allowed to be used in all
other context as variable or argument names.
Match semantics
^^^^^^^^^^^^^^^
TODO: Make the language about choosing a block more precise.
The overall semantics for choosing the match is to choose the first
matching pattern (including guard) and execute the corresponding
block. The remaining patterns are not tried. If there are no
matching patterns, execution continues at the following statement.
Name bindings made during a successful pattern match outlive the
executed block and can be used after the ``match`` statement.
During failed pattern matches, some subpatterns may succeed. For
example, while matching the pattern ``(0, x, 1)`` with the value ``[0,
1, 2]``, the subpattern ``x`` may succeed if the list elements are
matched from left to right. The implementation may choose to either
make persistent bindings for those partial matches or not. User code
including a ``match`` statement should not rely on the bindings being
made for a failed match, but also shouldn't assume that variables are
unchanged by a failed match. This part of the behavior is left
intentionally unspecified so different implementations can add
optimizations, and to prevent introducing semantic restrictions that
could limit the extensibility of this feature.
The precise pattern binding rules vary per pattern type and are
specified below.
.. _guards:
Guards
^^^^^^
Syntax::
case_block: "case" patterns [guard] ':' block
guard: 'if' named_expression
If a guard is present on a case block, once all patterns succeed,
the expression in the guard is evaluated.
If this raises an exception, the exception bubbles up.
Otherwise, if the condition is "truthy" the block is selected;
if it is "falsy" the next case block (if any) is tried.
.. _patterns:
Patterns
--------
The top-level syntax for patterns is as follows::
patterns: open_sequence_pattern | pattern
pattern: walrus_pattern | or_pattern
walrus_pattern: capture_pattern ':=' or_pattern
or_pattern: '|'.closed_pattern+
closed_pattern:
| literal_pattern
| capture_pattern
| wildcard_pattern
| constant_pattern
| group_pattern
| sequence_pattern
| mapping_pattern
| class_pattern
Walrus patterns
^^^^^^^^^^^^^^^
TODO: Change to ``or_pattern 'as' capture_pattern`` (and rename)?
Syntax::
walrus_pattern: capture_pattern ':=' or_pattern
(Note: the name on the left may not be ``_``.)
A walrus pattern matches the OR pattern on the right of the ``:=``
operator against the subject. If this fails, the walrus pattern fails.
Otherwise, the walrus pattern binds the subject to the name on the left
of the ``:=`` operator and succeeds.
OR patterns
^^^^^^^^^^^
Syntax::
or_pattern: '|'.closed_pattern+
When two or more patterns are separated by vertical bars (``|``),
this is called an OR pattern. (A single closed pattern is just that.)
Each subpattern must bind the same set of names.
An OR pattern matches each of its subpatterns in turn to the subject,
until one succeeds. The OR pattern is then deemed to succeed.
If none of the subpatterns succeed the OR pattern fails.
.. _literal_pattern:
Literal Patterns
^^^^^^^^^^^^^^^^
Syntax::
literal_pattern:
| signed_number
| signed_number '+' NUMBER
| signed_number '-' NUMBER
| strings
| 'None'
| 'True'
| 'False'
signed_number: NUMBER | '-' NUMBER
The rule ``strings`` and the token ``NUMBER`` are defined in the
standard Python grammar.
Triple-quoted strings are supported. Raw strings and byte strings
are supported. F-strings are not supported.
The forms ``signed_number '+' NUMBER`` and ``signed_number '-'
NUMBER`` are only permitted to express complex numbers; they require a
real number on the left and an imaginary number on the right.
A literal pattern succeeds if the subject value compares equal to the
value expressed by the literal, using the following comparisons rules:
- Numbers and strings are compared using the ``==`` operator.
- The singleton literals ``None``, ``True`` and ``False`` are compared
using the ``is`` operator.
.. _capture_pattern:
Capture Patterns
^^^^^^^^^^^^^^^^
Syntax::
capture_pattern: !"_" NAME
The single underscore (``_``) is not a capture pattern (this is what
``!"_"`` expresses). It is treated as a `wildcard pattern`_.
A capture pattern always succeeds. It binds the subject value to the
name using the scoping rules for name binding established for the
walrus operator in PEP 572. (Summary: the name becomes a local
variable in the closest containing function scope unless there's an
applicable ``nonlocal`` or ``global`` statement.)
In a given pattern, a given name may be bound only once. This
disallows for example ``case x, x: ...`` but allows ``case [x] | x:
...``.
.. _wildcard_pattern:
Wildcard Pattern
^^^^^^^^^^^^^^^^
Syntax::
wildcard_pattern: "_"
A wildcard pattern always succeeds. It binds no name.
.. _constant_value_pattern:
Constant Value Patterns
^^^^^^^^^^^^^^^^^^^^^^^
TODO: Rename to Value Patterns? (But ``value[s]_pattern`` is already
a grammatical rule.)
Syntax::
constant_pattern: attr
attr: name_or_attr '.' NAME
name_or_attr: attr | NAME
The dotted name in the pattern is looked up using the standard Python
name resolution rules. However, when the same constant pattern occurs
multiple times in the same ``match`` statement, the interpreter may cache
the first value found and reuse it, rather than repeat the same
lookup. (To clarify, this cache is strictly tied to a given execution
of a given ``match`` statement.)
The pattern succeeds if the value found thus compares equal to the
subject value (using the ``==`` operator).
Group Patterns
^^^^^^^^^^^^^^
Syntax:
group_pattern: '(' pattern ')'
(For the syntax of ``pattern``, see Patterns above. Note that it
contains no comma -- a parenthesized series of items with at least one
comma is a sequence pattern, as is ``()``.)
A parenthesized pattern has no additional syntax. It allows users to
add parentheses around patterns to emphasize the intended grouping.
.. _sequence_pattern:
Sequence Patterns
^^^^^^^^^^^^^^^^^
Syntax::
sequence_pattern:
| '[' [values_pattern] ']'
| '(' [open_sequence_pattern] ')'
open_sequence_pattern: value_pattern ',' [values_pattern]
values_pattern: ','.value_pattern+ ','?
value_pattern: star_pattern | pattern
star_pattern: '*' (capture_pattern | wildcard_pattern)
(Note that a single parenthesized pattern without a trailing comma is
a group pattern, not a sequence pattern. However a single pattern
enclosed in ``[...]`` is still a sequence pattern.)
There is no semantic difference between a sequence pattern using
``[...]``, a sequence pattern using ``(...)``, and an open sequence
pattern.
A sequence pattern may contain at most one star subpattern. The star
subpattern may occur in any position. If no star subpattern is
present, the sequence pattern is a fixed-length sequence pattern;
otherwise it is a variable-length sequence pattern.
A sequence pattern fails if the subject value is not an instance of
``collections.abc.Sequence``. It also fails if the subject value is
an instance of ``str``, ``bytes`` or ``bytearray``.
A fixed-length sequence pattern fails if the length of the subject
sequence is not equal to the number of subpatterns.
A variable-length sequence pattern fails if the length of the subject
sequence is less than the number of non-star subpatterns.
The length of the subject sequence is obtained using the builtin
``len()`` function (i.e., via the ``__len__`` protocol). However, the
interpreter may cache this value in a similar manner as described for
constant value patterns.
A fixed-length sequence pattern matches the subpatterns to
corresponding items of the subject sequence, from left to right.
Matching stops (with a failure) as soon as a subpattern fails. If all
subpatterns succeed in matching their corresponding item, the sequence
pattern succeeds.
A variable-length sequence pattern first matches the leading non-star
subpatterns to the curresponding items of the subject sequence, as for
a fixed-length sequence. If this succeeds, the star subpattern
matches a list formed of the remaining subject items, with items
removed from the end corresponding to the non-star subpatterns
following the star subpattern. The remaining non-star subpatterns are
then matched to the corresponding subject items, as for a fixed-length
sequence.
.. _mapping_pattern:
Mapping Patterns
^^^^^^^^^^^^^^^^
Syntax::
mapping_pattern: '{' [items_pattern] '}'
items_pattern: ','.key_value_pattern+ ','?
key_value_pattern:
| (literal_pattern | constant_pattern) ':' or_pattern
| double_star_pattern
double_star_pattern: '**' capture_pattern
(Note that ``**_`` is disallowed by this syntax.)
A mapping pattern may contain at most one double star pattern,
and it must be last.
A mapping pattern may not contain duplicate key values.
(If all key patterns are literal patterns this is considered a
syntax error; otherwise this is a runtime error and will
raise ``TypeError``.)
A mapping pattern fails if the subject value is not an instance of
``collections.abc.Mapping``.
A mapping pattern succeeds if every key given in the mapping pattern
matches the corresponding item of the subject mapping. If a ``'**'
NAME`` form is present, that name is bound to a ``dict`` containing
remaining key-value pairs from the subject mapping.
If duplicate keys are detected in the mapping pattern, the pattern is
considered invalid, and a ``ValueError`` is raised.
Key-value pairs are matched using the two-argument form of the
subject's ``get()`` method. As a consequence, matched key-value pairs
must already be present in the mapping, and not created on-the-fly by
``__missing__`` or ``__getitem__``. For example,
``collections.defaultdict`` instances will only be matched by patterns
with keys that were already present when the ``match`` block was
entered.
.. _class_pattern:
Class Patterns
^^^^^^^^^^^^^^
Syntax::
class_pattern:
| name_or_attr '(' [pattern_arguments ','?] ')'
pattern_arguments:
| positional_patterns [',' keyword_patterns]
| keyword_patterns
positional_patterns: ','.pattern+
keyword_patterns: ','.keyword_pattern+
keyword_pattern: NAME '=' or_pattern
(Note that positional patterns may be unparenthesized walrus patterns,
but keyword patterns may not.)
A class pattern may not repeat the same keyword multiple times.
If ``name_or_attr`` is not an instance of the builtin ``type``,
``TypeError`` is raised.
A class pattern fails if the subject is not an instance of ``name_or_attr``.
This is tested using ``isinstance()``.
If no arguments are present, the pattern succeeds if the ``isinstance()``
check succeeds. Otherwise:
- If only keyword patterns are present, they are processed as follows,
one by one:
- The keyword is looked up as an attribute on the subject.
- If this raises an exception other than ``AttributeError``,
the exception bubbles up.
- If this raises ``AttributeError`` the class pattern fails.
- Otherwise, the subpattern associated with the keyword is matched
against the attribute value. If this fails, the class pattern fails.
If it succeeds, the match proceeds to the next keyword.
- If all keyword patterns succeed, the class pattern as a whole succeeds.
- If any positional patterns are present, they are converted to keyword
patterns (see below) and treated as additional keyword patterns,
preceding the syntactic keyword patterns (if any).
Positional patterns are converted to keyword patterns using the
``__match_args__`` attribute on the class designated by ``name_or_attr``,
as follows:
- For a number of built-in types (specified below),
a single positional subpattern is accepted which will match
the entire subject; for these types no keyword patterns are accepted.
- The equivalent of ``getattr(cls, "__match_args__", ()))`` is called.
- If this raises an exception the exception bubbles up.
- If the returned value is not a list or tuple, the conversion fails
and ``TypeError`` is raised.
- If there are more positional patterns than the length of
``__match_args__``` (as obtained using ``len()``), ``TypeError`` is raised.
- Otherwise, positional pattern ``i`` is converted to a keyword pattern
using ``__match_args__[i]`` as the keyword,
provided it the latter is a string;
if it is not, ``TypeError`` is raised.
- For duplicate keywords, ``TypeError`` is raised.
Once the positional patterns have been converted to keyword patterns,
the match proceeds as if there were only keyword patterns.
As mentioned above, for the following built-in types the handling of
positional subpatterns is different:
``bool``, ``bytearray``, ``bytes``, ``dict``, ``float``,
``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``.
This behavior is roughly equivalent to the following::
class C:
__match_args__ = ["__match_self_prop__"]
@property
def __match_self_prop__(self):
return self
Side effects
============
The only side-effect produced explicitly by the matching process is
the binding of names. However, the process relies on attribute
access, instance checks, ``len()``, equality and item access on the
subject and some of its components. It also evaluates constant value
patterns and the class name of class patterns. While none of those
typically create any side-effects, in theory they could. This
proposal intentionally leaves out any specification of what methods
are called or how many times. This behavior is therefore undefined
and user code should not rely on it.
The standard library
====================
To facilitate the use of pattern matching, several changes will be
made to the standard library:
- Namedtuples and dataclasses will have auto-generated
``__match_args__``.
- For dataclasses the order of attributes in the generated
``__match_args__`` will be the same as the order of corresponding
arguments in the generated ``__init__()`` method. This includes the
situations where attributes are inherited from a superclass. Fields
with ``init=False`` are excluded from ``__match_args__``.
In addition, a systematic effort will be put into going through
existing standard library classes and adding ``__match_args__`` where
it looks beneficial.
.. _Appendix A:
Appendix A -- Full Grammar
==========================
TODO: Go over the differences with the reference implementation and
resolve them (either by fixing the PEP or by fixing the reference
implementation).
Here is the full grammar for ``match_stmt``. This is an additional
alternative for ``compound_stmt``. Remember that ``match`` and
``case`` are soft keywords, i.e. they are not reserved words in other
grammatical contexts (including at the start of a line if there is no
colon where expected). By convention, hard keywords use single quotes
while soft keywords use double quotes.
Other notation used beyond standard EBNF:
- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*``
- ``!RULE`` is a negative lookahead assertion
::
match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT
match_expr:
| star_named_expression ',' [star_named_expressions]
| named_expression
case_block: "case" patterns [guard] ':' block
guard: 'if' named_expression
patterns: open_sequence_pattern | pattern
pattern: walrus_pattern | or_pattern
walrus_pattern: capture_pattern ':=' or_pattern
or_pattern: '|'.closed_pattern+
closed_pattern:
| literal_pattern
| capture_pattern
| wildcard_pattern
| constant_pattern
| group_pattern
| sequence_pattern
| mapping_pattern
| class_pattern
literal_pattern:
| signed_number !('+' | '-')
| signed_number '+' NUMBER
| signed_number '-' NUMBER
| strings
| 'None'
| 'True'
| 'False'
signed_number: NUMBER | '-' NUMBER
capture_pattern: !"_" NAME !('.' | '(' | '=')
wildcard_pattern: "_"
constant_pattern: attr !('.' | '(' | '=')
attr: name_or_attr '.' NAME
name_or_attr: attr | NAME
group_pattern: '(' pattern ')'
sequence_pattern:
| '[' [values_pattern] ']'
| '(' [open_sequence_pattern] ')'
open_sequence_pattern: value_pattern ',' [values_pattern]
values_pattern: ','.value_pattern+ ','?
value_pattern: star_pattern | pattern
star_pattern: '*' (capture_pattern | wildcard_pattern)
mapping_pattern: '{' [items_pattern] '}'
items_pattern: ','.key_value_pattern+ ','?
key_value_pattern:
| (literal_pattern | constant_pattern) ':' or_pattern
| double_star_pattern
double_star_pattern: '**' capture_pattern
class_pattern:
| name_or_attr '(' [pattern_arguments ','?] ')'
pattern_arguments:
| positional_patterns [',' keyword_patterns]
| keyword_patterns
positional_patterns: ','.pattern+
keyword_patterns: ','.keyword_pattern+
keyword_pattern: NAME '=' or_pattern
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

977
pep-0635.rst Normal file
View File

@ -0,0 +1,977 @@
PEP: 635
Title: Structural Pattern Matching: Motivation and Rationale
Version: $Revision$
Last-Modified: $Date$
Author: Tobias Kohn <kohnt@tobiaskohn.ch>,
Guido van Rossum <guido@python.org>
BDFL-Delegate:
Discussions-To: Python-Dev <python-dev@python.org>
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 12-Sep-2020
Python-Version: 3.10
Post-History:
Resolution:
Abstract
========
**NOTE:** This draft is incomplete and not intended for review yet.
We're checking it into the peps repo for the convenience of the authors.
This PEP provides the motivation and rationale for PEP 634
("Structural Pattern Matching: Specification"). First-time readers
are encouraged to start with PEP 636, which provides a gentler
introduction to the concepts, syntax and semantics of patterns.
Motivation
==========
(Structural) pattern matching syntax is found in many languages, from
Haskell, Erlang and Scala to Elixir and Ruby. (A proposal for
JavaScript is also under consideration.)
Python already supports a limited form of this through sequence
unpacking assignments, which the new proposal leverages.
Several other common Python idioms are also relevant:
- The ``if ... elif ... elif ... else`` idiom is often used to find
out the type or shape of an object in an ad-hoc fashion, using one
or more checks like ``isinstance(x, cls)``, ``hasattr(x, "attr")``,
``len(x) == n`` or ``"key" in x`` as guards to select an applicable
block. The block can then assume ``x`` supports the interface
checked by the guard. For example::
if isinstance(x, tuple) and len(x) == 2:
host, port = x
mode = "http"
elif isinstance(x, tuple) and len(x) == 3:
host, port, mode = x
# Etc.
Code like this is more elegantly rendered using ``match``::
match x:
case host, port:
mode = "http"
case host, port, mode:
pass
# Etc.
- AST traversal code often looks for nodes matching a given pattern,
for example the code to detect a node of the shape "A + B * C" might
look like this::
if (isinstance(node, BinOp) and node.op == "+"
and isinstance(node.right, BinOp) and node.right.op == "*"):
a, b, c = node.left, node.right.left, node.right.right
# Handle a + b*c
Using ``match`` this becomes more readable::
match node:
case BinOp("+", a, BinOp("*", b, c)):
# Handle a + b*c
- TODO: Other compelling examples?
We believe that adding pattern matching to Python will enable Python
users to write cleaner, more readable code for examples like those
above, and many others.
Pattern matching and OO
-----------------------
Pattern matching is complimentary to the object-oriented paradigm.
Using OO and inheritance we can easily define a method on a base class
that defines default behavior for a specific operation on that class,
and we can override this default behavior in subclasses. We can also
use the Visitor pattern to separate actions from data.
But this is not sufficient for all situations. For example, a code
generator may consume an AST, and have many operations where the
generated code needs to vary based not just on the class of a node,
but also on the value of some class attributes, like the ``BinOp``
example above. The Visitor pattern is insufficiently flexible for
this: it can only select based on the class.
For a complete example, see
https://github.com/gvanrossum/patma/blob/master/examples/expr.py#L231
TODO: Could we say more here?
Pattern and functional style
----------------------------
Most Python applications and libraries are not written in a consistent
OO style -- unlike Java, Python encourages defining functions at the
top-level of a module, and for simple data structures, tuples (or
named tuples or lists) and dictionaries are often used exclusively or
mixed with classes or data classes.
Pattern matching is particularly suitable for picking apart such data
structures. As an extreme example, it's easy to write code that picks
a JSON data structure using ``match``.
TODO: Example code.
Rationale
=========
TBD.
This section should provide the rationale for individual design decisions.
It takes the place of "Rejected ideas" in the standard PEP format.
It is organized in sections corresponding to the specification (PEP 634).
Overview and terminology
------------------------
The ``match`` statement
-----------------------
The match statement evaluates an expression to produce a subject, finds the
first pattern that matches the subject and executes the associated block
of code. Syntactically, the match statement thus takes an expression and
a sequence of case clauses, where each case clause comprises a pattern and
a block of code.
Since case clauses comprise a block of code, they adhere to the existing
indentation scheme with the syntactic structure of
``<keyword> ...: <(indented) block>``, which in turn makes it a (compound)
statement. The chosen keyword ``case`` reflects its widespread use in
pattern matching languages, ignoring those languages that use other
syntactic means such as a symbol like ``|`` because it would not fit
established Python structures. The syntax of patterns following the
keyword is discussed below.
Given that the case clauses follow the structure of a compound statement,
the match statement itself naturally becomes a compoung statement itself
as well, following the same syntactic structure. This naturally leads to
``match <expr>: <case_clause>+``. Note that the match statement determines
a quasi-scope in which the evaluated subject is kept alive (although not in
a local variable), similar to how a with statement might keep a resource
alive during execution of its block. Furthermore, control flows from the
match statement to a case clause and then leaves the block of the match
statement. The block of the match statement thus has both syntactic and
semantic meaning.
Various suggestions have sought to eliminate or avoid the naturally arising
"double indentation" of a case clause's code block. Unfortunately, all such
proposals of *flat indentation schemes* come at the expense of violating
Python's establish structural paradigm, leading to additional syntactic
rules:
- *Unindented case clauses.*
The idea is to align case clauses with the ``match``, i.e.::
match expression:
case pattern_1:
...
case pattern_2:
...
This may look awkward to the eye of a Python programmer, because
everywhere else colon is followed by an indent. The ``match`` would
neither follow the syntactic scheme of simple nor composite statements
but rather establish a category of its own.
- *Putting the expression on a separate line after ``match``.*
The idea is to use the expression yielding the subject as a statement
to avoid the singularity of ``match`` having no actual block despite
the colons::
match:
expression
case pattern_1:
...
case pattern_2:
...
This was ultimately rejected because the first block would be another
novelty in Python's grammar: a block whose only content is a single
expression rather than a sequence of statements. Attempts to amend this
issue by adding or repurposing yet another keyword along the lines of
``match: return expression`` did not yield any satisfactory solution.
Although flat indentation would save some horizontal space, the cost of
increased complexity or unusual rules is too high. It would also complicate
life for simple-minded code editors. Finally, the horizontal space issue can
be alleviated by allowing "half-indent" (i.e. two spaces instead of four)
for match statements.
In sample programs using match, written as part of the development of this
PEP, a noticeable improvement in code brevity is observed, more than making
up for the additional indentation level.
*Statement v Expression.* Some suggestions centered around the idea of
making ``match`` an expression rather than a statement. However, this
would fit poorly with Python's statement-oriented nature and lead to
unusually long and complex expressions with the need to invent new
syntactic constructs or break well established syntactic rules. An
obvious consequence of ``match`` as an expression would be that case
clauses could no longer have abitrary blocks of code attached, but only
a single expression. Overall, the strong limitations could in no way
offset the slight simplification in some special use cases.
Match semantics
~~~~~~~~~~~~~~~
The patterns of different case clauses might overlap in that more than
one case clause would match a given subject. The first-to-match rule
ensures that the selection of a case clause for a given subject is
unambiguous. Furthermore, case clauses can have increasingly general
patterns matching wider classes of subjects. The first-to-match rule
then ensures that the most precise pattern can be chosen (although it
is the programmer's responsibility to order the case clauses correctly).
In a statically typed language, the match statement would be compiled to
a decision tree to select a matching pattern quickly and very efficiently.
This would, however, require that all patterns be purely declarative and
static, running against the established dynamic semantics of Python. The
proposed semantics thus represent a path incorporating the best of both
worlds: patterns are tried in a strictly sequential order so that each
case clause constitutes an actual stement. At the same time, we allow
the interpreter to cache any information about the subject or change the
order in which subpatterns are tried. In other words: if the interpreter
has found that the subject is not an instance of a class ``C``, it can
directly skip case clauses testing for this again, without having to
perform repeated instance-checks. If a guard stipulates that a variable
``x`` must be positive, say (i.e. ``if x > 0``), the interpreter might
check this directly after binding ``x`` and before any further
subpatterns are considered.
*Binding and scoping.* In many pattern matching implementations, each
case clause would establish a separate scope of its own. Variables bound
by a pattern would then only be visible inside the corresponding case block.
In Python, however, this does not make sense. Establishing separate scopes
would essentially mean that each case clause is a separate function without
direct access to the variables in the surrounding scope (without having to
resort to ``nonlocal`` that is). Moreover, a case clause could no longer
influence any surrounding control flow through standard statement such as
``return`` or ``break``. Hence, such script scoping would lead to
unintuitive and surprising behavior.
A direct consequence of this is that any variable bindings outlive the
respective case or match statements. Even patterns that only match a
subject partially might bind local variables (this is, in fact, necessary
for guards to function properly). However, this escaping of variable
bindings is in line with existing Python structures such as for loops and
with statements.
.. _patterns:
Patterns
--------
Patterns fulfill two purposes: they impose (structural) constraints on
the subject and they specify which data values should be extracted from
the subject and bound to variables. In iterable unpacking, which can be
seen as a prototype to pattern matching in Python, there is only one
*structural pattern* to express sequences while there is a rich set of
*binding patterns* to assign a value to a specific variable or field.
Full pattern matching differs from this in that there is more variety
in structual patterns but only a minimum of binding patterns.
Patterns differ from assignment targets (as in iterable unpacking) in that
they impose additional constraints on the structure of the subject and in
that a subject might safely fail to match a specific pattern at any point
(in iterable unpacking, this constitutes an error). The latter means that
pattern should avoid side effects wherever possible, including binding
values to attributes or subscripts.
A cornerstone of pattern matching is the possibility of arbitrarily
*nesting patterns*. The nesting allows for expressing deep
tree structures (for an example of nested class patterns, see the motivation
section above) as well as alternatives.
Although the structural patterns might superficially look like expressions,
it is important to keep in mind that there is a clear distinction. In fact,
no pattern is or contains an expression. It is more productive to think of
patterns as declarative elements similar to the formal parameters in a
function definition.
Walrus patterns
~~~~~~~~~~~~~~~
OR patterns
~~~~~~~~~~~
The OR pattern allows you to combine 'structurally equivalent' alternatives
into a new pattern, i.e. several patterns can share a common handler. If any
one of an OR pattern's subpatterns matches the given subject, the entire OR
pattern succeeds.
Statically typed languages prohibit the binding of names (capture patterns)
inside an OR pattern because of potential conflicts concerning the types of
variables. As a dynamically typed language, Python can be less restrictive
here and allow capture patterns inside OR patterns. However, each subpattern
must bind the same set of variables so as not to leave potentially undefined
names. With two alternatives ``P | Q``, this means that if *P* binds the
variables *u* and *v*, *Q* must bind exactly the same variables *u* and *v*.
There was some discussion on whether to use the bar ``|`` or the keyword
``or`` in order to separate alternatives. The OR pattern does not fully fit
the existing semantics and usage of either of these two symbols. However,
``|`` is the symbol of choice in all programming languages with support of
the OR pattern and is even used in that capacity for regular expressions in
Python as well. Moreover, ``|`` is not only used for bitwise OR, but also
for set unions and dict merging (:pep:`584`).
Other alternatives were considered as well, but none of these would allow
OR-patterns to be nested inside other patterns:
- *Using a comma*::
case 401, 403, 404:
print("Some HTTP error")
This looks too much like a tuple -- we would have to find a different way
to spell tuples, and the construct would have to be parenthesized inside
the argument list of a class pattern. In general, commas already have many
different meanings in Python, we shouldn't add more.
- *Using stacked cases*::
case 401:
case 403:
case 404:
print("Some HTTP error")
This is how this would be done in *C*, using its fall-through semantics
for cases. However, we don't want to mislead people into thinking that
match/case uses fall-through semantics (which are a common source of bugs
in *C*). Also, this would be a novel indentation pattern, which might make
it harder to support in IDEs and such (it would break the simple rule "add
an indentation level after a line ending in a colon"). Finally, this
would not support OR patterns nested inside other patterns.
- *Using ``case in`` followed by a comma-separated list*::
case in 401, 403, 404:
print("Some HTTP error")
This would not work for OR patterns nested inside other patterns, like::
case Point(0|1, 0|1):
print("A corner of the unit square")
*AND and NOT patterns.*
This proposal defines an OR-pattern (|) to match one of several alternates;
why not also an AND-pattern (``&``) or even a NOT-pattern (``!``)?
Especially given that some other languages (``F#`` for example) support
AND-patterns.
However, it is not clear how useful this would be. The semantics for matching
dictionaries, objects and sequences already incorporates an implicit 'and':
all attributes and elements mentioned must be present for the match to
succeed. Guard conditions can also support many of the use cases that a
hypothetical 'and' operator would be used for.
A negation of a match pattern using the operator ``!`` as a prefix would match
exactly if the pattern itself does not match. For instance, ``!(3 | 4)``
would match anything except ``3`` or ``4``. However, there is evidence from
other languages that this is rarely useful and primarily used as double
negation ``!!`` to control variable scopes and prevent variable bindings
(which does not apply to Python).
In the end, it was decided that this would make the syntax more complex
without adding a significant benefit.
Example::
def simplify(expr):
match expr:
case ('/', 0, 0):
return expr
case ('*' | '/', 0, _):
return 0
case ('+' | '-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*' | '/', x, 1):
return x
return expr
.. _capture_pattern:
Capture Patterns
~~~~~~~~~~~~~~~~
Capture patterns take on the form of a name that accepts any value and binds
it to a (local) variable (unless the name is declared as ``nonlocal`` or
``global``). In that sense, a simple capture pattern is basically equivalent
to a parameter in a function definition (when the function is called, each
parameter binds the respective argument to a local variable in the function's
scope).
A name used for a capture pattern must not coincide with another capture
pattern in the same pattern. This, again, is similar to parameters, which
equally require each parameter name to be unique within the list of
parameters. It differs, however, from iterable unpacking assignment, where
the repeated use of a variable name as target is permissible (e.g.,
``x, x = 1, 2``). The rationale for not supporting ``(x, x)`` in patterns
is its ambiguous reading: it could be seen as in iterable unpacking where
only the second binding to ``x`` survives. But it could be equally seen as
expressing a tuple with two equal elements (which comes with its own issues).
Should the need arise, then it is still possible to introduce support for
repeated use of names later on.
There were calls to explicitly mark capture patterns and thus identify them
as binding targets. According to that idea, a capture pattern would be
written as, e.g. ``?x`` or ``$x``. The aim of such explicit capture markers
is to let an unmarked name be a constant value pattern (see below). However,
this is based on the misconception that pattern matching was an extension of
*switch* statements, placing the emphasis on fast switching based on
(ordinal) values. Such a *switch* statement has indeed been proposed for
Python before (see :pep:`275` and :pep:`3103`). Pattern matching, on the other
hand, builds a generalized concept of iterable unpacking. Binding values
extracted from a data structure is at the very core of the concept and hence
the most common use case. Explicit markers for capture patterns would thus
betray the objective of the proposed pattern matching syntax and simplify
a secondary use case at the expense of additional syntactic clutter for
core cases.
Example::
def average(*args):
match args:
case [x, y]: # captures the two elements of a sequence
return (x + y) / 2
case [x]: # captures the only element of a sequence
return x
case []:
return 0
case x: # captures the entire sequence
return sum(x) / len(x)
.. _wildcard_pattern:
Wildcard Pattern
~~~~~~~~~~~~~~~~
The wildcard pattern is a special case of a 'capture' pattern: it accepts
any value, but does not bind it to a variable. The idea behind this rule
is to support repeated use of the wildcard in patterns. While ``(x, x)``
is an error, ``(_, _)`` is legal.
Particularly in larger (sequence) patterns, it is important to allow the
pattern to concentrate on values with actual significance while ignoring
anything else. Without a wildcard, it would become necessary to 'invent'
a number of local variables, which would be bound but never used. Even
when sticking to naming conventions and using e.g. ``_1, _2, _3`` to name
irrelevant values, say, this still introduces visual clutter and can hurt
performance (compare the sequence pattern ``(x, y, *z)`` to ``(_, y, *_)``,
where the ``*z`` forces the interpreter to copy a potentially very long
sequence, whereas the second version simply compiles to code along the
lines of ``y = seq[1]``).
There has been much discussion about the choice of the underscore as ``_``
as a wildcard pattern, i.e. making this one name non-binding. However, the
underscore is already heavily used as an 'ignore value' marker in iterable
unpacking. Since the wildcard pattern ``_`` never binds, this use of the
underscore does not interfere with other uses such as inside the REPL or
the ``gettext`` module.
It has been proposed to use ``...`` (i.e., the ellipsis token) or ``*``
(star) as a wildcard. However, both these look as if an arbitrary number
of items is omitted::
case [a, ..., z]: ...
case [a, *, z]: ...
Both look like the would match a sequence of at two or more items,
capturing the first and last values.
A single wildcard clause (i.e. ``case _:``) is semantically equivalent to
an ``else:``. It accepts any subject without binding it to a variable or
performing any other operation. However, the wildcard pattern is in
contrast to ``else`` usable as a subpattern in nested patterns.
Finally note that the underscore is as a wildcard pattern in *every*
programming language with pattern matching that we could find
(including *C#*, *Elixir*, *Erlang*, *F#*, *Grace*, *Haskell*,
*Mathematica*, *OCaml*, *Ruby*, *Rust*, *Scala*, *Swift*, and *Thorn*).
Keeping in mind that many users of Python also work with other programming
languages, have prior experience when learning Python, or moving on to
other languages after having learnt Python, we find that such well
established standards are important and relevant with respect to
readability and learnability. In our view, concerns that this wildcard
means that a regular name received special treatment are not strong
enough to introduce syntax that would make Python special.
Example::
def is_closed(sequence):
match sequence:
case [_]: # any sequence with a single element
return True
case [start, *_, end]: # a sequence with at least two elements
return start == end
case _: # anything
return False
.. _literal_pattern:
Literal Patterns
~~~~~~~~~~~~~~~~
Literal patterns are a convenient way for imposing constraints on the
value of a subject, rather than its type or structure. Literal patterns
even allow you to emulate a switch statement using pattern matching.
Generally, the subject is compared to a literal pattern by means of standard
equality (``x == y`` in Python syntax). Consequently, the literal patterns
``1.0`` and ``1`` match exactly the same set of objects, i.e. ``case 1.0:``
and ``case 1:`` are fully interchangable. In principle, ``True`` would also
match the same set of objects because ``True == 1`` holds. However, we
believe that many users would be surprised finding that ``case True:``
matched the object ``1.0``, resulting in some subtle bugs and convoluted
workarounds. We therefore adopted the rule that the three singleton
objects ``None``, ``False`` and ``True`` match by identity (``x is y`` in
Python syntax) rather than equality. Hence, ``case True:`` will match only
``True`` and nothing else. Note that ``case 1:`` would still match ``True``,
though, because the literal pattern ``1`` works by equality and not identity.
Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would
match both the integer ``1`` and the floating point number ``1.0``, whereas
``case 1:`` would only match the integer ``1`` were eventually dropped in
favor of the simpler and consistent rule based on equality. Moreover, any
additional checks whether the subject is an instance of ``numbers.Integral``
would come at a high runtime cost to introduce what would essentially be
novel in Python. When needed, the explicit syntax ``case int(1):`` might
be used.
Recall that literal patterns are *not* expressions, but directly denote a
specific value or object. From a syntactical point of view, we have to
ensure that negative and complex numbers can equally be used as patterns,
although they are not atomic literal values (i.e. the seeming literal value
``-3+4j`` would syntactically be an expression of the form
``BinOp(UnaryOp('-', 3), '+', 4j)``, but as expressions are not part of
patterns, we added syntactic support for such complex value literals without
having to resort to full expressions). Interpolated *f*-strings, on the
other hand, are not literal values, despite their appearance and can
therefore not be used as literal patterns (string concatenation, however,
is supported).
Literal patterns not only occur as patterns in their own right, but also
as keys in *mapping patterns*.
Example::
def simplify(expr):
match expr:
case ('+', 0, x):
return x
case ('+' | '-', x, 0):
return x
case ('and', True, x):
return x
case ('and', False, x):
return False
case ('or', False, x):
return x
case ('or', True, x):
return True
case ('not', ('not', x)):
return x
return expr
.. _constant_value_pattern:
Constant Value Patterns
~~~~~~~~~~~~~~~~~~~~~~~
It is good programming style to use named constants for parametric values or
to clarify the meaning of particular values. Clearly, it would be desirable
to also write ``case (HttpStatus.OK, body):`` rather than
``case (200, body):``, for example. The main issue that arises here is how to
distinguish capture patterns (variables) from constant value patterns. The
general discussion surrounding this issue has brought forward a plethora of
options, which we cannot all fully list here.
Strictly speaking, constant value patterns are not really necessary, but
could be implemented using guards, i.e.
``case (status, body) if status == HttpStatus.OK:``. Nonetheless, the
convenience of constant value patterns is unquestioned and obvious.
The observation that constants tend to be written in uppercase letters or
collected in enumeration-like namespaces suggests possible rules to discern
constants syntactically. However, the idea of using upper vs. lower case as
a marker has been met with scepticism since there is no similar precedence
in core Python (although it is common in other languages). We therefore only
adopted the rule that any dotted name (i.e. attribute access) is to be
interpreted as a constant value pattern like ``HttpStatus.OK``
above. This precludes, in particular, local variables from acting as
constants.
Global variables can only be directly used as constant when defined in other
modules, although there are workarounds to access the current module as a
namespace as well. A proposed rule to use a leading dot (e.g.
``.CONSTANT``) for that purpose was critisised because it was felt that the
dot would not be a visible-enough marker for that purpose. Partly inspired
by use cases in other programming languages, a number of different
markers/sigils were proposed (such as ``^CONSTANT``, ``$CONSTANT``,
``==CONSTANT``, ``CONSTANT?``, or the word enclosed in backticks), although
there was no obvious or natural choice. The current proposal therefore
leaves the discussion and possible introduction of such a 'constant' marker
for future PEPs.
Distinguishing the semantics of names based on whether it is a global
variable (i.e. the compiler would treat global variables as constants rather
than capture patterns) leads to various issues. The addition or alteration
of a global variable in the module could have unintended side effects on
patterns. Moreover, pattern matching could not be used directly inside a
module's scope because all variables would be global, making capture
patterns impossible.
Example::
def handle_reply(reply):
match reply:
case (HttpStatus.OK, MimeType.TEXT, body):
process_text(body)
case (HttpStatus.OK, MimeType.APPL_ZIP, body):
text = deflate(body)
process_text(text)
case (HttpStatus.MOVED_PERMANENTLY, new_URI):
resend_request(new_URI)
case (HttpStatus.NOT_FOUND):
raise ResourceNotFound()
Group Patterns
~~~~~~~~~~~~~~
Allowing users to explicitly specify the grouping is particularly helpful
in case of OR patterns.
.. _sequence_pattern:
Sequence Patterns
~~~~~~~~~~~~~~~~~
Sequence patterns follow as closely as possible the already established
syntax and semantics of iterable unpacking. Of course, subpatterns take
the place of assignment targets (variables, attributes and subscript).
Moreover, the sequence pattern only matches a carefully selected set of
possible subjects, whereas iterable unpacking can be applied to any
iterable.
- As in iterable unpacking, we do not distinguish between 'tuple' and
'list' notation. ``[a, b, c]``, ``(a, b, c)`` and ``a, b, c`` are all
equivalent. While this means we have a redundant notation and checking
specifically for lists or tuples requires more effort (e.g.
``case list([a, b, c])``), we mimick iterable unpacking as much as
possible.
- A starred pattern will capture a sub-sequence of arbitrary length,
mirroring iterable unpacking as well. Only one starred item may be
present in any sequence pattern. In theory, patterns such as ``(*_, 3, *_)``
could be understood as expressing any sequence containing the value ``3``.
In practise, however, this would only work for a very narrow set of use
cases and lead to inefficient backtracking or even ambiguities otherwise.
- The sequence pattern does *not* iterate through an iterable subject. All
elements are accessed through subscripting and slicing, and the subject must
be an instance of ``collections.abc.Sequence`` (including, in particular,
lists and tuples, but excluding strings and bytes, as well as sets and
dictionaries).
A sequence pattern cannot just iterate through any iterable object. The
consumption of elements from the iteration would have to be undone if the
overall pattern fails, which is not possible.
Relying on ``len()`` and subscripting and slicing alone does not work to
identify sequences because sequences share the protocol with more general
maps (dictionaries) in this regard. It would be surprising if a sequence
pattern also matched dictionaries or other custom objects that implement
the mapping protocol (i.e. ``__getitem__``). The interpreter therefore
performs an instance check to ensure that the subject in question really
is a sequence (of known type).
String and bytes objects have a dual nature: they are both 'atomic' objects
in their own right, as well as sequences (with a strongly recursive nature
in that a string is a sequence of strings). The typical behavior and use
cases for strings and bytes are different enough from that of tuples and
lists to warrant a clear distinction. It is in fact often unintuitive and
unintended that strings pass for sequences as evidenced by regular questions
and complaints. Strings and bytes are therefore not matched by a sequence
pattern, limiting the sequence pattern to a very specific understanding of
'sequence'.
.. _mapping_pattern:
Mapping Patterns
~~~~~~~~~~~~~~~~
Dictionaries or mappings in general are one of the most important and most
widely used data structures in Python. In contrast to sequences mappings
are built for fast direct access to arbitrary elements (identified by a key).
In most use cases an element is retrieved from a dictionary by a known key
without regard for any ordering or other key-value pairs stored in the same
dictionary. Particularly common are string keys.
The mapping pattern reflects the common usage of dictionary lookup: it allows
the user to extract some values from a mapping by means of constant/known
keys and have the values match given subpatterns. Moreover, the mapping
pattern does not check for the presence of additional keys. Should it be
necessary to impose an upper bound on the mapping and ensure that no
additional keys are present, then the usual double-star-pattern ``**rest``
can be used. The special case ``**_`` with a wildcard, however, is not
supported as it would not have any effect, but might lead to a wrong
understanding of the mapping pattern's semantics.
To avoid overly expensive matching algorithms, keys must be literals or
constant values.
Example::
def change_red_to_blue(json_obj):
match json_obj:
case { 'color': ('red' | '#FF0000') }:
json_obj['color'] = 'blue'
case { 'children': children }:
for child in children:
change_red_to_blue(child)
.. _class_pattern:
Class Patterns
~~~~~~~~~~~~~~
Class patterns fulfil two purposes: checking whether a given subject is
indeed an instance of a specific class and extracting data from specific
attributes of the subject. A quick survey revealed that ``isinstance()``
is indeed one of the most often used functions in Python in terms of
static occurrences in programs. Such instance checks typically precede
a subsequent access to information stored in the object, or a possible
manipulation thereof. A typical pattern might be along the lines of::
def traverse_tree(node):
if isinstance(node, Node):
traverse_tree(node.left)
traverse_tree(node.right)
elif isinstance(node, Leaf):
print(node.value)
In many cases, however, class patterns occur nested as in the example
given in the motivation::
if (isinstance(node, BinOp) and node.op == "+"
and isinstance(node.right, BinOp) and node.right.op == "*"):
a, b, c = node.left, node.right.left, node.right.right
# Handle a + b*c
The class pattern lets you to concisely specify both an instance-check as
well as relevant attributes (with possible further constraints). It is
thereby very tempting to write, e.g., ``case Node(left, right):`` in the
first case above and ``case Leaf(value):`` in the second. While this
indeed works well for languages with strict algebraic data types, it is
problematic with the structure of Python objects.
When dealing with general Python objects, we face a potentially very large
number of unordered attributes: an instance of ``Node`` contains a large
number of attributes (most of which are 'private methods' such as, e.g.,
``__repr__``). Moreover, the interpreter cannot reliably deduce which of
the attributes comes first and which comes second. For an object that
represents a circle, say, there is no inherently obvious ordering of the
attributes ``x``, ``y`` and ``radius``.
We envision two possibilities for dealing with this issue: either explicitly
name the attributes of interest or provide an additional mapping that tells
the interpreter which attributes to extract and in which order. Both
approaches are supported. Moreover, explicitly naming the attributes of
interest lets you further specify the required structure of an object; if
an object lacks an attribute specified by the pattern, the match fails.
- Attributes that are explicitly named pick up the syntax of named arguments.
If an object of class ``Node`` has two attributes ``left`` and ``right``
as above, the pattern ``Node(left=x, right=y)`` will extract the values of
both attributes and assign them to ``x`` and ``y``, respectively. The data
flow from left to right seems unusual, but is in line with mapping patterns
and has precedents such as assignments via ``as`` in *with*- or
*import*-statements.
Naming the attributes in question explicitly will be mostly used for more
complex cases where the positional form (below) is insufficient.
- The class field ``__match_args__`` specifies a number of attributes
together with their ordering, allowing class patterns to rely on positional
sub-patterns without having to explicitly name the attributes in question.
This is particularly handy for smaller objects or instances of data classes,
where the attributes of interest are rather obvious and often have a
well-defined ordering. In a way, ``__match_args__`` is similar to the
declaration of formal parameters, which allows to call functions with
positional arguments rather than naming all the parameters.
The syntax of class patterns is based on the idea that de-construction
mirrors the syntax of construction. This is already the case in virtually
any Python construct, be assignment targets, function definitions or
iterable unpacking. In all these cases, we find that the syntax for
sending and that for receiving 'data' are virtually identical.
- Assignment targets such as variables, attributes and subscripts:
``foo.bar[2] = foo.bar[3]``;
- Function definitions: a function defined with ``def foo(x, y, z=6)``
is called as, e.g., ``foo(123, y=45)``, where the actual arguments
provided at the call site are matched against the formal parameters
at the definition site;
- Iterable unpacking: ``a, b = b, a`` or ``[a, b] = [b, a]`` or
``(a, b) = (b, a)``, just to name a few equivalent possibilities.
Using the same syntax for reading and writing, l- and r-values, or
construction and de-construction is widely accepted for its benefits in
thinking about data, its flow and manipulation. This equally extends to
the explicit construction of instances, where class patterns ``c(p, q)``
deliberately mirror the syntax of creating instances.
History and Context
===================
Pattern matching emerged in the late 1970s in the form of tuple unpacking
and as a means to handle recursive data structures such as linked lists or
trees (object-oriented languages usually use the visitor pattern for handling
recursive data structures). The early proponents of pattern matching
organised structured data in 'tagged tuples' rather than ``struct`` as in
*C* or the objects introduced later. A node in a binary tree would, for
instance, be a tuple with two elements for the left and right branches,
respectively, and a ``Node`` tag, written as ``Node(left, right)``. In
Python we would probably put the tag inside the tuple as
``('Node', left, right)`` or define a data class `Node` to achieve the
same effect.
Using modern syntax, a depth-first tree traversal would then be written as
follows::
def traverse_tree(node):
node match:
case Node(left, right):
DFS(left)
DFS(right)
case Leaf(value):
handle(value)
The notion of handling recursive data structures with pattern matching
immediately gave rise to the idea of handling more general recursive
'patterns' (i.e. recursion beyond recursive data structures)
with pattern matching. Pattern matching would thus also be used to define
recursive functions such as::
def fib(arg):
match arg:
case 0:
return 1
case 1:
return 1
case n:
return fib(n-1) + fib(n-2)
As pattern matching was repeatedly integrated into new and emerging
programming languages, its syntax slightly evolved and expanded. The two
first cases in the ``fib`` example above could be written more succinctly
as ``case 0 | 1:`` with ``|`` denoting alternative patterns. Moreover, the
underscore ``_`` was widely adopted as a wildcard, a filler where neither
the structure nor value of parts of a pattern were of substance. Since the
underscore is already frequently used in equivalent capacity in Python's
iterable unpacking (e.g., ``_, _, third, _* = something``) we kept these
universal standards.
It is noteworthy that the concept of pattern matching has always been
closely linked to the concept of functions. The different case clauses
have always been considered as something like semi-indepedent functions
where pattern variables take on the role of parameters. This becomes
most apparent when pattern matching is written as an overloaded function,
along the lines of (Standard ML)::
fun fib 0 = 1
| fib 1 = 1
| fib n = fib (n-1) + fib (n-2)
Even though such a strict separation of case clauses into independent
functions does not make sense in Python, we find that patterns share many
syntactic rules with parameters, such as binding arguments to unqualified
names only or that variable/parameter names must not be repeated for
a particular pattern/function.
With its emphasis on abstraction and encapsulation, object-oriented
programming posed a serious challenge to pattern matching. In short: in
object-oriented programming, we can no longer view objects as tagged tuples.
The arguments passed into the constructor do not necessarily specify the
attributes or fields of the objects. Moreover, there is no longer a strict
ordering of an object's fields and some of the fields might be private and
thus inaccessible. And on top of this, the given object might actually be
an instance of a subclass with slightly different structure.
To address this challenge, patterns became increasingly independent of the
original tuple constructors. In a pattern like ``Node(left, right)``,
``Node`` is no longer a passive tag, but rather a function that can actively
check for any given object whether it has the right structure and extract a
``left`` and ``right`` field. In other words: the ``Node``-tag becomes a
function that transforms an object into a tuple or returns some failure
indicator if it is not possible.
In Python, we simply use ``isinstance()`` together with the ``__match_args__``
field of a class to check whether an object has the correct structure and
then transform some of its attributes into a tuple. For the `Node` example
above, for instance, we would have ``__match_args__ = ('left', 'right')`` to
indicate that these two attributes should be extracted to form the tuple.
That is, ``case Node(x, y)`` would first check whether a given object is an
instance of ``Node`` and then assign ``left`` to ``x`` and ``right`` to ``y``,
respectively.
Paying tribute to Python's dynamic nature with 'duck typing', however, we
also added a more direct way to specify the presence of, or constraints on
specific attributes. Instead of ``Node(x, y)`` you could also write
``object(left=x, right=y)``, effectively eliminating the ``isinstance()``
check and thus supporting any object with ``left`` and ``right`` attributes.
Or you would combine these ideas to write ``Node(right=y)`` so as to require
an instance of ``Node`` but only extract the value of the `right` attribute.
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

385
pep-0636.rst Normal file
View File

@ -0,0 +1,385 @@
PEP: 636
Title: Structural Pattern Matching: Tutorial
Version: $Revision$
Last-Modified: $Date$
Author: Daniel F Moisset <dfmoisset@gmail.com>,
Tobias Kohn <kohnt@tobiaskohn.ch>
Sponsor: Guido van Rossum <guido@python.org>
BDFL-Delegate:
Discussions-To: Python-Dev <python-dev@python.org>
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 12-Sep-2020
Python-Version: 3.10
Post-History:
Resolution:
Abstract
========
**NOTE:** This draft is incomplete and not intended for review yet.
We're checking it into the peps repo for the convenience of the authors.
This PEP is a tutorial for the pattern matching introduced by PEP 634.
PEP 622 proposed syntax for pattern matching, which received detailed discussion
both from the community and the Steering Council. A frequent concern was
about how easy it would be to explain (and learn) this feature. This PEP
addresses that concern providing the kind of document which developers could use
to learn about pattern matching in Python.
This is considered supporting material for PEP 634 (the technical specification
for pattern matching) and PEP 635 (the motivation and rationale for having pattern
matching and design considerations).
Meta
====
This section is intended to get in sync about style and language with
co-authors. It should be removed from the released PEP
The following are design decisions I made while writing this:
1. Who is the target audience?
I'm considering "People with general Python experience" (i.e. who shouldn't be surprised
at anything in the Python tutorial), but not necessarily involved with the
design/development or Python. I'm assuming someone who hasn't been exposed to pattern
matching in other languages.
2. How detailed should this document be?
I considered a range from "very superficial" (like the detail level you might find about
statements in the Python tutorial) to "terse but complete" like
https://github.com/gvanrossum/patma/#tutorial
to "long and detailed". I chose the later, we can always trim down from that.
3. What kind of examples to use?
I tried to write examples that are could that I might write using pattern matching. I
avoided going
for a full application (because the examples I have in mind are too large for a PEP) but
I tried to follow ideas related to a single project to thread the story-telling more
easily. This is probably the most controversial thing here, and if the rest of
the authors dislike it, we can change to a more formal explanatory style.
Other rules I'm following (let me know if I forgot to):
* I'm not going to reference/compare with other languages
* I'm not trying to convince the reader that this is a good idea (that's the job of
PEP 635) just explain how to use it
* I'm not trying to cover every corner case (that's the job of PEP 634), just cover
how to use the full functionality in the "normal" cases.
* I talk to the learner in second person
Tutorial
========
As an example to motivate this tutorial, you will be writing a text-adventure. That is
a form of interactive fiction where the user enters text commands to interact with a
fictional world and receives text descriptions of what happens. Commands will be
simplified forms of natural language like ``get sword``, ``attack dragon``, ``go north``,
``enter shop`` or ``buy cheese``.
Matching sequences
------------------
Your main loop will need to get input from the user and split it into words, let's say
a list of strings like this::
command = input("What are you doing next? ")
# analyze the result of command.split()
The next step is to interpret the words. Most of our commands will have two words: an
action and an object. So you may be tempted to do the following::
[action, obj] = command.split()
... # interpret action, obj
The problem with that line of code is that it's missing something: what if the user
types more or fewer than 2 words? To prevent this problem you can either check the length
of the list of words, or capture the ``ValueError`` that the statement above would raise.
You can use a matching statement instead::
match command.split():
case [action, obj]:
... # interpret action, obj
The ``match`` statement evaluates the **subject** after the ``match`` keyword, and checks
it against the **pattern** next to ``case``. A pattern is able to do two different
things:
* Verify that the subject has certain structure. In your case, the ``[action, obj]``
pattern matches any sequence of exactly two elements. This is called **matching**
* It will bind some names in the pattern to component elements of your subject. In
this case, if the list has two elements, it will bind ``action = subject[0]`` and
``obj = subject[1]``. This is called **destructuring**
If there's a match, the statements inside the ``case`` clause will be executed with the
bound variables. If there's no match, nothing happens and the next statement after
``match`` keeps running.
TODO: discuss other sequences, tuples. Discuss syntax with parenthesis. discuss
iterators? discuss [x, x] possibly later on?
Matching multiple patterns
--------------------------
Even if most commands have the action/object form, you might want to have user commands
of different lengths. For example you might want to add single verbs with no object like
``look`` or ``quit``. A match statement can (and is likely to) have more than one
``case``::
match command.split():
case [action]:
... # interpret single-verb action
case [action, obj]:
... # interpret action, obj
The ``match`` statement will check patterns from top to bottom. If the pattern doesn't
match the subject, the next pattern will be tried. However, once the *first*
matching ``case`` clause is found, the body of that clause is executed, and all further
``case`` clauses are ignored. This is similar to the way that an ``if/elif/elif/...``
statement works.
Matching specific values
------------------------
Your code still needs to look at the specific actions and conditionally run
different logic depending on the specific action (e.g., ``quit``, ``attack``, or ``buy``).
You could do that using a chain of ``if/elif/elif/...``, or using a dictionary of
functions, but here we'll leverage pattern matching to solve that task. Instead of a
variable, you can use literal values in patterns (like ``"quit"``, ``42``, or ``None``).
This allows you to write::
match command.split():
case ["quit"]:
print("Goodbye!")
quit_game()
case ["look"]:
current_room.describe()
case ["get", obj]:
character.get(obj, current_room)
case ["go", direction]:
current_room = current_room.neighbor(direction)
# The rest of your commands go here
A pattern like ``["get", obj]`` will match only 2-element sequences that have a first
element equal to ``"get"``. When destructuring, it will bind ``obj = subject[1]``.
As you can see in the ``go`` case, we also can use different variable names in
different patterns.
FIXME: This *might* be the place to explain a bit that when I say "literal" I mean it
literally, and a "soft constant" will not work :)
Matching slices
---------------
A player may be able to drop multiple objects by using a series of commands
``drop key``, ``drop sword``, ``drop cheese``. This interface might be cumbersome, and
you might like to allow dropping multiple items in a single command, like
``drop key sword cheese``. In this case you don't know beforehand how many words will
be in the command, but you can use extended unpacking in patterns in the same way that
they are allowed in assignments::
match command.split():
case ["drop", *objects]:
for obj in objects:
character.drop(obj, current_room)
# The rest of your commands go here
This will match any sequences having "drop" as its first elements. All remaining
elements will be captured in a ``list`` object which will be bound to the ``objects``
variable.
This syntax has similar restrictions as sequence unpacking: you can not have more than one
starred name in a pattern.
Adding a catch-all
------------------
You may want to print an error message saying that the command wasn't recognized when
all the patterns fail. You could use the feature we just learned and write the
following::
match command.split():
case ["quit"]: ... # Code omitted for brevity
case ["go", direction]: ...
case ["drop", *objects]: ...
... # Other case clauses
case [*ignored_words]:
print(f"Sorry, I couldn't understand {command!r}")
Note that you must add this last pattern at the end, otherwise it will match before other
possible patterns that could be considered. This works but it's a bit verbose and
somewhat wasteful: this will make a full copy of the word list, which will be bound to
``ignored_words`` even if it's never used.
You can use an special pattern which is written ``_``, which always matches but it
doesn't bind anything. which would allow you to rewrite::
match command.split():
... # Other case clauses
case [*_]:
print(f"Sorry, I couldn't understand {command!r}")
This pattern will match for any sequence. In this case we can simplify even more and
match any object::
match command.split():
... # Other case clauses
case _:
print(f"Sorry, I couldn't understand {command!r}")
TODO: Explain about syntaxerror when having an irrefutable pattern above others?
How patterns are composed
-------------------------
This is a good moment to step back from the examples and understand how the patterns
that you have been using are built. Patterns can be nested within each other, and we
have being doing that implicitly in the examples above.
There are some "simple" patterns ("simple" here meaning that they do not contain other
patterns) that we've seen:
* **Literal patterns** (string literals, number literals, ``True``, ``False``, and
``None``)
* The **wildcard pattern** ``_``
* **Capture patterns** (stand-alone names like ``direction``, ``action``, ``objects``). We
never discussed these separately, but used them as part of other patterns. Note that
a capture pattern by itself will always match, and usually makes sense only
as a catch-all at the end of your ``match`` if you desire to bind the name to the
subject.
Until now, the only non-simple pattern we have experimented with is the sequence pattern.
Each element in a sequence pattern can in fact be
any other pattern. This means that you could write a pattern like
``["first", (left, right), *rest]``. This will match subjects which are a sequence of at
least two elements, where the first one is equal to ``"first"`` and the second one is
in turn a sequence of two elements. It will also bind ``left=subject[1][0]``,
``right=subject[1][1]``, and ``rest = subject[2:]``
Alternate patterns
------------------
Going back to the adventure game example, you may find that you'd like to have several
patterns resulting in the same outcome. For example, you might want the commands
``north`` and ``go north`` be equivalent. You may also desire to have aliases for
``get X``, ``pick up X`` and ``pick X up`` for any X.
The ``|`` symbol in patterns combines them as alternatives. You could for example write::
match command.split():
... # Other case clauses
case ["north"] | ["go", "north"]:
current_room = current_room.neighbor("north")
case ["get", obj] | ["pick", "up", obj] | ["pick", obj, "up"]:
... # Code for picking up the given object
This is called an **or pattern** and will produce the expected result. Patterns are
attempted from left to right; this may be relevant to know what is bound if more than
one alternative matches. An important restriction when writing or patterns is that all
alternatives should bind the same variables. So a pattern ``[1, x] | [2, y]`` is not
allowed because it would make unclear which variable would be bound after a successful
match. ``[1, x] | [2, x]`` is perfectly fine and will always bind ``x`` if successful.
Capturing matched sub-patterns
------------------------------
The first version of our "go" command was written with a ``["go", direction]`` pattern.
The change we did in our last version using the pattern ``["north"] | ["go", "north"]``
has some benefits but also some drawbacks in comparison: the latest version allows the
alias, but also has the direction hardcoded, which will force us to actually have
separate patterns for north/south/east/west. This leads to some code duplication, but at
the same time we get better input validation, and we will not be getting into that
branch if the command entered by the user is ``"go figure!"`` instead of an direction.
We could try to get the best of both worlds doing the following (I'll omit the aliased
version without "go" for brevity)::
match command.split():
case ["go", ("north" | "south" | "east" | "west")]:
current_room = current_room.neighbor(...)
# how do I know which direction to go?
This code is a single branch, and it verifies that the word after "go" is really a
direction. But the code moving the player around needs to know which one was chosen and
has no way to do so. What we need is a pattern that behaves like the or pattern but at
the same time does a capture. We can do so with a **walrus pattern**::
match command.split():
case ["go", direction := ("north" | "south" | "east" | "west")]:
current_room = current_room.neighbor(direction)
The walrus pattern (named like that because the ``:=`` operator looks like a sideways
walrus) matches whatever pattern is on its right hand side, but also binds the value to
a name.
Adding conditions to patterns
-----------------------------
The patterns we have explored above can do some powerful data filtering, but sometimes
you may wish for the full power of a boolean expression. Let's say that you would actually
like to allow a "go" command only in a restricted set of directions based on the possible
exits from the current_room. We can achieve that by adding a **guard** to our
case-clause. Guards consist of the ``if`` keyword followed by any expression::
match command.split():
case ["go", direction] if direction in current_room.exits:
current_room = current_room.neighbor(direction)
case ["go", _]:
print("Sorry, you can't go that way")
The guard is not part of the pattern, it's part of the case clause. It's only checked if
the pattern matches, and after all the pattern variables have been bound (that's why the
condition can use the ``direction`` variable in the example above). If the pattern
matches and the condition is truthy, the body of the case clause runs normally. If the
pattern matches but the condition is falsy, the match statement proceeds to check the
next ``case`` clause as if the pattern hadn't matched (with the possible side-effect of
having already bound some variables).
The sequence of these steps must be considered carefully when combining or-patterns and
guards. If you have ``case [x, 100] | [0, x] if x > 10`` and your subject is
``[0, 100]``, the clause will be skipped. This happens because:
* The or-pattern finds the first alternative that matches the subject, which happens to
be ``[x, 100]``
* ``x`` is bound to 0
* The condition x > 10 is checked. Given that it's false, the whole case clause is
skipped. The ``[0, x]`` pattern is never attempted.
Going to the cloud: Mappings
----------------------------
TODO: Give the motivating example of netowrk requests, describe JSON based "protocol"
TODO: partial matches, double stars
Matching objects
----------------
UI events motivations. describe events in dataclasses. inspiration for event objects
can be taken from https://www.pygame.org/docs/ref/event.html
example of getting constants from module (like key names for keyboard events)
customizing match_args?
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: