PEP 634, PEP 635, PEP 636: Split PEP 622 in three parts (#1598)
This commit is contained in:
parent
30eab3e5b4
commit
0c87b5a018
|
@ -10,12 +10,13 @@ Author: Brandt Bucher <brandtbucher@gmail.com>,
|
|||
Talin <viridia@gmail.com>
|
||||
BDFL-Delegate:
|
||||
Discussions-To: Python-Dev <python-dev@python.org>
|
||||
Status: Draft
|
||||
Status: Superseded
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 23-Jun-2020
|
||||
Python-Version: 3.10
|
||||
Post-History: 23-Jun-2020, 8-Jul-2020
|
||||
Superseded-By: 634
|
||||
Resolution:
|
||||
|
||||
|
||||
|
|
|
@ -0,0 +1,653 @@
|
|||
PEP: 634
|
||||
Title: Structural Pattern Matching: Specification
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Brandt Bucher <brandtbucher@gmail.com>,
|
||||
Guido van Rossum <guido@python.org>
|
||||
BDFL-Delegate:
|
||||
Discussions-To: Python-Dev <python-dev@python.org>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 12-Sep-2020
|
||||
Python-Version: 3.10
|
||||
Post-History:
|
||||
Replaces: 622
|
||||
Resolution:
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
**NOTE:** This draft is incomplete and not intended for review yet.
|
||||
We're checking it into the peps repo for the convenience of the authors.
|
||||
|
||||
This PEP provides the technical specification for the ``match``
|
||||
statement. It replaces PEP 622, which is hereby split in three parts:
|
||||
|
||||
- PEP 634: Specification
|
||||
- PEP 635: Motivation and Rationale
|
||||
- PEP 636: Tutorial
|
||||
|
||||
This PEP is intentionally devoid of commentary; all explanations of
|
||||
design choices are in PEP 635. First-time readers are encouraged to
|
||||
start with PEP 636, which provides a gentler introduction to the
|
||||
concepts, syntax and semantics of patterns.
|
||||
|
||||
TODO: Maybe we should add simple examples back to each section?
|
||||
There's no rule saying a spec can't include examples, and currently
|
||||
it's *very* dry.
|
||||
|
||||
TODO: Go over the feedback from the SC and make sure everything's
|
||||
somehow incorporated (either here or in PEP 635, which has to answer
|
||||
why we didn't budge on most of the SC's initial requests).
|
||||
|
||||
|
||||
Syntax and Semantics
|
||||
====================
|
||||
|
||||
See `Appendix A`_ for the complete grammar.
|
||||
|
||||
Overview and terminology
|
||||
------------------------
|
||||
|
||||
The pattern matching process takes as input a pattern (following
|
||||
``case``) and a subject value (following ``match``). Phrases to
|
||||
describe the process include "the pattern is matched with (or against)
|
||||
the subject value" and "we match the pattern against (or with) the
|
||||
subject value".
|
||||
|
||||
The primary outcome of pattern matching is success or failure. In
|
||||
case of success we may say "the pattern succeeds", "the match
|
||||
succeeds", or "the pattern matches the subject value".
|
||||
|
||||
In many cases a pattern contains subpatterns, and success or failure
|
||||
is determined by the success or failure of matching those subpatterns
|
||||
against the value (e.g., for OR patterns) or against parts of the
|
||||
value (e.g., for sequence patterns). This process typically processes
|
||||
the subpatterns from left to right until the overall outcome is
|
||||
determined. E.g., an OR pattern succeeds at the first succeeding
|
||||
subpattern, while a sequence patterns fails at the first failing
|
||||
subpattern.
|
||||
|
||||
A secondary outcome of pattern matching may be one or more name
|
||||
bindings. We may say "the pattern binds a value to a name". When
|
||||
subpatterns tried until the first success, only the bindings due to
|
||||
the successful subpattern are valid; when trying until the first
|
||||
failure, the bindings are merged. Several more rules, explained
|
||||
below, apply to these cases.
|
||||
|
||||
|
||||
The ``match`` statement
|
||||
-----------------------
|
||||
|
||||
Syntax::
|
||||
|
||||
match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT
|
||||
match_expr:
|
||||
| star_named_expression ',' star_named_expressions?
|
||||
| named_expression
|
||||
case_block: "case" patterns [guard] ':' block
|
||||
guard: 'if' named_expression
|
||||
|
||||
The rules ``star_named_expression``, ``star_named_expressions``,
|
||||
``named_expression`` and ``block`` are part of the `standard Python
|
||||
grammar <https://docs.python.org/3.10/reference/grammar.html>`_.
|
||||
|
||||
The rule ``patterns`` is specified below.
|
||||
|
||||
For context, ``match_stmt`` is a new alternative for
|
||||
``compound_statement``::
|
||||
|
||||
compound_statement:
|
||||
| if_stmt
|
||||
...
|
||||
| match_stmt
|
||||
|
||||
|
||||
The ``match`` and ``case`` keywords are soft keywords, i.e. they are
|
||||
not reserved words in other grammatical contexts (including at the
|
||||
start of a line if there is no colon where expected). This implies
|
||||
that they are recognized as keywords when part of a ``match``
|
||||
statement or ``case`` block only, and are allowed to be used in all
|
||||
other context as variable or argument names.
|
||||
|
||||
|
||||
Match semantics
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
TODO: Make the language about choosing a block more precise.
|
||||
|
||||
The overall semantics for choosing the match is to choose the first
|
||||
matching pattern (including guard) and execute the corresponding
|
||||
block. The remaining patterns are not tried. If there are no
|
||||
matching patterns, execution continues at the following statement.
|
||||
|
||||
Name bindings made during a successful pattern match outlive the
|
||||
executed block and can be used after the ``match`` statement.
|
||||
|
||||
During failed pattern matches, some subpatterns may succeed. For
|
||||
example, while matching the pattern ``(0, x, 1)`` with the value ``[0,
|
||||
1, 2]``, the subpattern ``x`` may succeed if the list elements are
|
||||
matched from left to right. The implementation may choose to either
|
||||
make persistent bindings for those partial matches or not. User code
|
||||
including a ``match`` statement should not rely on the bindings being
|
||||
made for a failed match, but also shouldn't assume that variables are
|
||||
unchanged by a failed match. This part of the behavior is left
|
||||
intentionally unspecified so different implementations can add
|
||||
optimizations, and to prevent introducing semantic restrictions that
|
||||
could limit the extensibility of this feature.
|
||||
|
||||
The precise pattern binding rules vary per pattern type and are
|
||||
specified below.
|
||||
|
||||
|
||||
.. _guards:
|
||||
|
||||
Guards
|
||||
^^^^^^
|
||||
|
||||
Syntax::
|
||||
|
||||
case_block: "case" patterns [guard] ':' block
|
||||
guard: 'if' named_expression
|
||||
|
||||
If a guard is present on a case block, once all patterns succeed,
|
||||
the expression in the guard is evaluated.
|
||||
If this raises an exception, the exception bubbles up.
|
||||
Otherwise, if the condition is "truthy" the block is selected;
|
||||
if it is "falsy" the next case block (if any) is tried.
|
||||
|
||||
|
||||
.. _patterns:
|
||||
|
||||
Patterns
|
||||
--------
|
||||
|
||||
The top-level syntax for patterns is as follows::
|
||||
|
||||
patterns: open_sequence_pattern | pattern
|
||||
pattern: walrus_pattern | or_pattern
|
||||
walrus_pattern: capture_pattern ':=' or_pattern
|
||||
or_pattern: '|'.closed_pattern+
|
||||
closed_pattern:
|
||||
| literal_pattern
|
||||
| capture_pattern
|
||||
| wildcard_pattern
|
||||
| constant_pattern
|
||||
| group_pattern
|
||||
| sequence_pattern
|
||||
| mapping_pattern
|
||||
| class_pattern
|
||||
|
||||
|
||||
Walrus patterns
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
TODO: Change to ``or_pattern 'as' capture_pattern`` (and rename)?
|
||||
|
||||
Syntax::
|
||||
|
||||
walrus_pattern: capture_pattern ':=' or_pattern
|
||||
|
||||
(Note: the name on the left may not be ``_``.)
|
||||
|
||||
A walrus pattern matches the OR pattern on the right of the ``:=``
|
||||
operator against the subject. If this fails, the walrus pattern fails.
|
||||
Otherwise, the walrus pattern binds the subject to the name on the left
|
||||
of the ``:=`` operator and succeeds.
|
||||
|
||||
|
||||
OR patterns
|
||||
^^^^^^^^^^^
|
||||
|
||||
Syntax::
|
||||
|
||||
or_pattern: '|'.closed_pattern+
|
||||
|
||||
When two or more patterns are separated by vertical bars (``|``),
|
||||
this is called an OR pattern. (A single closed pattern is just that.)
|
||||
|
||||
Each subpattern must bind the same set of names.
|
||||
|
||||
An OR pattern matches each of its subpatterns in turn to the subject,
|
||||
until one succeeds. The OR pattern is then deemed to succeed.
|
||||
If none of the subpatterns succeed the OR pattern fails.
|
||||
|
||||
|
||||
.. _literal_pattern:
|
||||
|
||||
Literal Patterns
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
Syntax::
|
||||
|
||||
literal_pattern:
|
||||
| signed_number
|
||||
| signed_number '+' NUMBER
|
||||
| signed_number '-' NUMBER
|
||||
| strings
|
||||
| 'None'
|
||||
| 'True'
|
||||
| 'False'
|
||||
signed_number: NUMBER | '-' NUMBER
|
||||
|
||||
The rule ``strings`` and the token ``NUMBER`` are defined in the
|
||||
standard Python grammar.
|
||||
|
||||
Triple-quoted strings are supported. Raw strings and byte strings
|
||||
are supported. F-strings are not supported.
|
||||
|
||||
The forms ``signed_number '+' NUMBER`` and ``signed_number '-'
|
||||
NUMBER`` are only permitted to express complex numbers; they require a
|
||||
real number on the left and an imaginary number on the right.
|
||||
|
||||
A literal pattern succeeds if the subject value compares equal to the
|
||||
value expressed by the literal, using the following comparisons rules:
|
||||
|
||||
- Numbers and strings are compared using the ``==`` operator.
|
||||
|
||||
- The singleton literals ``None``, ``True`` and ``False`` are compared
|
||||
using the ``is`` operator.
|
||||
|
||||
|
||||
.. _capture_pattern:
|
||||
|
||||
Capture Patterns
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
Syntax::
|
||||
|
||||
capture_pattern: !"_" NAME
|
||||
|
||||
The single underscore (``_``) is not a capture pattern (this is what
|
||||
``!"_"`` expresses). It is treated as a `wildcard pattern`_.
|
||||
|
||||
A capture pattern always succeeds. It binds the subject value to the
|
||||
name using the scoping rules for name binding established for the
|
||||
walrus operator in PEP 572. (Summary: the name becomes a local
|
||||
variable in the closest containing function scope unless there's an
|
||||
applicable ``nonlocal`` or ``global`` statement.)
|
||||
|
||||
In a given pattern, a given name may be bound only once. This
|
||||
disallows for example ``case x, x: ...`` but allows ``case [x] | x:
|
||||
...``.
|
||||
|
||||
.. _wildcard_pattern:
|
||||
|
||||
Wildcard Pattern
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
Syntax::
|
||||
|
||||
wildcard_pattern: "_"
|
||||
|
||||
A wildcard pattern always succeeds. It binds no name.
|
||||
|
||||
.. _constant_value_pattern:
|
||||
|
||||
Constant Value Patterns
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
TODO: Rename to Value Patterns? (But ``value[s]_pattern`` is already
|
||||
a grammatical rule.)
|
||||
|
||||
Syntax::
|
||||
|
||||
constant_pattern: attr
|
||||
attr: name_or_attr '.' NAME
|
||||
name_or_attr: attr | NAME
|
||||
|
||||
The dotted name in the pattern is looked up using the standard Python
|
||||
name resolution rules. However, when the same constant pattern occurs
|
||||
multiple times in the same ``match`` statement, the interpreter may cache
|
||||
the first value found and reuse it, rather than repeat the same
|
||||
lookup. (To clarify, this cache is strictly tied to a given execution
|
||||
of a given ``match`` statement.)
|
||||
|
||||
The pattern succeeds if the value found thus compares equal to the
|
||||
subject value (using the ``==`` operator).
|
||||
|
||||
|
||||
Group Patterns
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
Syntax:
|
||||
|
||||
group_pattern: '(' pattern ')'
|
||||
|
||||
(For the syntax of ``pattern``, see Patterns above. Note that it
|
||||
contains no comma -- a parenthesized series of items with at least one
|
||||
comma is a sequence pattern, as is ``()``.)
|
||||
|
||||
A parenthesized pattern has no additional syntax. It allows users to
|
||||
add parentheses around patterns to emphasize the intended grouping.
|
||||
|
||||
|
||||
.. _sequence_pattern:
|
||||
|
||||
Sequence Patterns
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
Syntax::
|
||||
|
||||
sequence_pattern:
|
||||
| '[' [values_pattern] ']'
|
||||
| '(' [open_sequence_pattern] ')'
|
||||
open_sequence_pattern: value_pattern ',' [values_pattern]
|
||||
values_pattern: ','.value_pattern+ ','?
|
||||
value_pattern: star_pattern | pattern
|
||||
star_pattern: '*' (capture_pattern | wildcard_pattern)
|
||||
|
||||
(Note that a single parenthesized pattern without a trailing comma is
|
||||
a group pattern, not a sequence pattern. However a single pattern
|
||||
enclosed in ``[...]`` is still a sequence pattern.)
|
||||
|
||||
There is no semantic difference between a sequence pattern using
|
||||
``[...]``, a sequence pattern using ``(...)``, and an open sequence
|
||||
pattern.
|
||||
|
||||
A sequence pattern may contain at most one star subpattern. The star
|
||||
subpattern may occur in any position. If no star subpattern is
|
||||
present, the sequence pattern is a fixed-length sequence pattern;
|
||||
otherwise it is a variable-length sequence pattern.
|
||||
|
||||
A sequence pattern fails if the subject value is not an instance of
|
||||
``collections.abc.Sequence``. It also fails if the subject value is
|
||||
an instance of ``str``, ``bytes`` or ``bytearray``.
|
||||
|
||||
A fixed-length sequence pattern fails if the length of the subject
|
||||
sequence is not equal to the number of subpatterns.
|
||||
|
||||
A variable-length sequence pattern fails if the length of the subject
|
||||
sequence is less than the number of non-star subpatterns.
|
||||
|
||||
The length of the subject sequence is obtained using the builtin
|
||||
``len()`` function (i.e., via the ``__len__`` protocol). However, the
|
||||
interpreter may cache this value in a similar manner as described for
|
||||
constant value patterns.
|
||||
|
||||
A fixed-length sequence pattern matches the subpatterns to
|
||||
corresponding items of the subject sequence, from left to right.
|
||||
Matching stops (with a failure) as soon as a subpattern fails. If all
|
||||
subpatterns succeed in matching their corresponding item, the sequence
|
||||
pattern succeeds.
|
||||
|
||||
A variable-length sequence pattern first matches the leading non-star
|
||||
subpatterns to the curresponding items of the subject sequence, as for
|
||||
a fixed-length sequence. If this succeeds, the star subpattern
|
||||
matches a list formed of the remaining subject items, with items
|
||||
removed from the end corresponding to the non-star subpatterns
|
||||
following the star subpattern. The remaining non-star subpatterns are
|
||||
then matched to the corresponding subject items, as for a fixed-length
|
||||
sequence.
|
||||
|
||||
|
||||
.. _mapping_pattern:
|
||||
|
||||
Mapping Patterns
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
Syntax::
|
||||
|
||||
mapping_pattern: '{' [items_pattern] '}'
|
||||
items_pattern: ','.key_value_pattern+ ','?
|
||||
key_value_pattern:
|
||||
| (literal_pattern | constant_pattern) ':' or_pattern
|
||||
| double_star_pattern
|
||||
double_star_pattern: '**' capture_pattern
|
||||
|
||||
(Note that ``**_`` is disallowed by this syntax.)
|
||||
|
||||
A mapping pattern may contain at most one double star pattern,
|
||||
and it must be last.
|
||||
|
||||
A mapping pattern may not contain duplicate key values.
|
||||
(If all key patterns are literal patterns this is considered a
|
||||
syntax error; otherwise this is a runtime error and will
|
||||
raise ``TypeError``.)
|
||||
|
||||
A mapping pattern fails if the subject value is not an instance of
|
||||
``collections.abc.Mapping``.
|
||||
|
||||
A mapping pattern succeeds if every key given in the mapping pattern
|
||||
matches the corresponding item of the subject mapping. If a ``'**'
|
||||
NAME`` form is present, that name is bound to a ``dict`` containing
|
||||
remaining key-value pairs from the subject mapping.
|
||||
|
||||
If duplicate keys are detected in the mapping pattern, the pattern is
|
||||
considered invalid, and a ``ValueError`` is raised.
|
||||
|
||||
Key-value pairs are matched using the two-argument form of the
|
||||
subject's ``get()`` method. As a consequence, matched key-value pairs
|
||||
must already be present in the mapping, and not created on-the-fly by
|
||||
``__missing__`` or ``__getitem__``. For example,
|
||||
``collections.defaultdict`` instances will only be matched by patterns
|
||||
with keys that were already present when the ``match`` block was
|
||||
entered.
|
||||
|
||||
|
||||
.. _class_pattern:
|
||||
|
||||
Class Patterns
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
Syntax::
|
||||
|
||||
class_pattern:
|
||||
| name_or_attr '(' [pattern_arguments ','?] ')'
|
||||
pattern_arguments:
|
||||
| positional_patterns [',' keyword_patterns]
|
||||
| keyword_patterns
|
||||
positional_patterns: ','.pattern+
|
||||
keyword_patterns: ','.keyword_pattern+
|
||||
keyword_pattern: NAME '=' or_pattern
|
||||
|
||||
(Note that positional patterns may be unparenthesized walrus patterns,
|
||||
but keyword patterns may not.)
|
||||
|
||||
A class pattern may not repeat the same keyword multiple times.
|
||||
|
||||
If ``name_or_attr`` is not an instance of the builtin ``type``,
|
||||
``TypeError`` is raised.
|
||||
|
||||
A class pattern fails if the subject is not an instance of ``name_or_attr``.
|
||||
This is tested using ``isinstance()``.
|
||||
|
||||
If no arguments are present, the pattern succeeds if the ``isinstance()``
|
||||
check succeeds. Otherwise:
|
||||
|
||||
- If only keyword patterns are present, they are processed as follows,
|
||||
one by one:
|
||||
|
||||
- The keyword is looked up as an attribute on the subject.
|
||||
|
||||
- If this raises an exception other than ``AttributeError``,
|
||||
the exception bubbles up.
|
||||
|
||||
- If this raises ``AttributeError`` the class pattern fails.
|
||||
|
||||
- Otherwise, the subpattern associated with the keyword is matched
|
||||
against the attribute value. If this fails, the class pattern fails.
|
||||
If it succeeds, the match proceeds to the next keyword.
|
||||
|
||||
- If all keyword patterns succeed, the class pattern as a whole succeeds.
|
||||
|
||||
- If any positional patterns are present, they are converted to keyword
|
||||
patterns (see below) and treated as additional keyword patterns,
|
||||
preceding the syntactic keyword patterns (if any).
|
||||
|
||||
Positional patterns are converted to keyword patterns using the
|
||||
``__match_args__`` attribute on the class designated by ``name_or_attr``,
|
||||
as follows:
|
||||
|
||||
- For a number of built-in types (specified below),
|
||||
a single positional subpattern is accepted which will match
|
||||
the entire subject; for these types no keyword patterns are accepted.
|
||||
- The equivalent of ``getattr(cls, "__match_args__", ()))`` is called.
|
||||
- If this raises an exception the exception bubbles up.
|
||||
- If the returned value is not a list or tuple, the conversion fails
|
||||
and ``TypeError`` is raised.
|
||||
- If there are more positional patterns than the length of
|
||||
``__match_args__``` (as obtained using ``len()``), ``TypeError`` is raised.
|
||||
- Otherwise, positional pattern ``i`` is converted to a keyword pattern
|
||||
using ``__match_args__[i]`` as the keyword,
|
||||
provided it the latter is a string;
|
||||
if it is not, ``TypeError`` is raised.
|
||||
- For duplicate keywords, ``TypeError`` is raised.
|
||||
|
||||
Once the positional patterns have been converted to keyword patterns,
|
||||
the match proceeds as if there were only keyword patterns.
|
||||
|
||||
As mentioned above, for the following built-in types the handling of
|
||||
positional subpatterns is different:
|
||||
``bool``, ``bytearray``, ``bytes``, ``dict``, ``float``,
|
||||
``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``.
|
||||
|
||||
This behavior is roughly equivalent to the following::
|
||||
|
||||
class C:
|
||||
__match_args__ = ["__match_self_prop__"]
|
||||
@property
|
||||
def __match_self_prop__(self):
|
||||
return self
|
||||
|
||||
|
||||
Side effects
|
||||
============
|
||||
|
||||
The only side-effect produced explicitly by the matching process is
|
||||
the binding of names. However, the process relies on attribute
|
||||
access, instance checks, ``len()``, equality and item access on the
|
||||
subject and some of its components. It also evaluates constant value
|
||||
patterns and the class name of class patterns. While none of those
|
||||
typically create any side-effects, in theory they could. This
|
||||
proposal intentionally leaves out any specification of what methods
|
||||
are called or how many times. This behavior is therefore undefined
|
||||
and user code should not rely on it.
|
||||
|
||||
|
||||
The standard library
|
||||
====================
|
||||
|
||||
To facilitate the use of pattern matching, several changes will be
|
||||
made to the standard library:
|
||||
|
||||
- Namedtuples and dataclasses will have auto-generated
|
||||
``__match_args__``.
|
||||
|
||||
- For dataclasses the order of attributes in the generated
|
||||
``__match_args__`` will be the same as the order of corresponding
|
||||
arguments in the generated ``__init__()`` method. This includes the
|
||||
situations where attributes are inherited from a superclass. Fields
|
||||
with ``init=False`` are excluded from ``__match_args__``.
|
||||
|
||||
In addition, a systematic effort will be put into going through
|
||||
existing standard library classes and adding ``__match_args__`` where
|
||||
it looks beneficial.
|
||||
|
||||
|
||||
.. _Appendix A:
|
||||
|
||||
Appendix A -- Full Grammar
|
||||
==========================
|
||||
|
||||
TODO: Go over the differences with the reference implementation and
|
||||
resolve them (either by fixing the PEP or by fixing the reference
|
||||
implementation).
|
||||
|
||||
Here is the full grammar for ``match_stmt``. This is an additional
|
||||
alternative for ``compound_stmt``. Remember that ``match`` and
|
||||
``case`` are soft keywords, i.e. they are not reserved words in other
|
||||
grammatical contexts (including at the start of a line if there is no
|
||||
colon where expected). By convention, hard keywords use single quotes
|
||||
while soft keywords use double quotes.
|
||||
|
||||
Other notation used beyond standard EBNF:
|
||||
|
||||
- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*``
|
||||
- ``!RULE`` is a negative lookahead assertion
|
||||
|
||||
::
|
||||
|
||||
match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT
|
||||
match_expr:
|
||||
| star_named_expression ',' [star_named_expressions]
|
||||
| named_expression
|
||||
case_block: "case" patterns [guard] ':' block
|
||||
guard: 'if' named_expression
|
||||
|
||||
patterns: open_sequence_pattern | pattern
|
||||
pattern: walrus_pattern | or_pattern
|
||||
walrus_pattern: capture_pattern ':=' or_pattern
|
||||
or_pattern: '|'.closed_pattern+
|
||||
closed_pattern:
|
||||
| literal_pattern
|
||||
| capture_pattern
|
||||
| wildcard_pattern
|
||||
| constant_pattern
|
||||
| group_pattern
|
||||
| sequence_pattern
|
||||
| mapping_pattern
|
||||
| class_pattern
|
||||
|
||||
literal_pattern:
|
||||
| signed_number !('+' | '-')
|
||||
| signed_number '+' NUMBER
|
||||
| signed_number '-' NUMBER
|
||||
| strings
|
||||
| 'None'
|
||||
| 'True'
|
||||
| 'False'
|
||||
signed_number: NUMBER | '-' NUMBER
|
||||
|
||||
capture_pattern: !"_" NAME !('.' | '(' | '=')
|
||||
|
||||
wildcard_pattern: "_"
|
||||
|
||||
constant_pattern: attr !('.' | '(' | '=')
|
||||
attr: name_or_attr '.' NAME
|
||||
name_or_attr: attr | NAME
|
||||
|
||||
group_pattern: '(' pattern ')'
|
||||
|
||||
sequence_pattern:
|
||||
| '[' [values_pattern] ']'
|
||||
| '(' [open_sequence_pattern] ')'
|
||||
open_sequence_pattern: value_pattern ',' [values_pattern]
|
||||
values_pattern: ','.value_pattern+ ','?
|
||||
value_pattern: star_pattern | pattern
|
||||
star_pattern: '*' (capture_pattern | wildcard_pattern)
|
||||
|
||||
mapping_pattern: '{' [items_pattern] '}'
|
||||
items_pattern: ','.key_value_pattern+ ','?
|
||||
key_value_pattern:
|
||||
| (literal_pattern | constant_pattern) ':' or_pattern
|
||||
| double_star_pattern
|
||||
double_star_pattern: '**' capture_pattern
|
||||
|
||||
class_pattern:
|
||||
| name_or_attr '(' [pattern_arguments ','?] ')'
|
||||
pattern_arguments:
|
||||
| positional_patterns [',' keyword_patterns]
|
||||
| keyword_patterns
|
||||
positional_patterns: ','.pattern+
|
||||
keyword_patterns: ','.keyword_pattern+
|
||||
keyword_pattern: NAME '=' or_pattern
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the
|
||||
CC0-1.0-Universal license, whichever is more permissive.
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
|
@ -0,0 +1,977 @@
|
|||
PEP: 635
|
||||
Title: Structural Pattern Matching: Motivation and Rationale
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Tobias Kohn <kohnt@tobiaskohn.ch>,
|
||||
Guido van Rossum <guido@python.org>
|
||||
BDFL-Delegate:
|
||||
Discussions-To: Python-Dev <python-dev@python.org>
|
||||
Status: Draft
|
||||
Type: Informational
|
||||
Content-Type: text/x-rst
|
||||
Created: 12-Sep-2020
|
||||
Python-Version: 3.10
|
||||
Post-History:
|
||||
Resolution:
|
||||
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
**NOTE:** This draft is incomplete and not intended for review yet.
|
||||
We're checking it into the peps repo for the convenience of the authors.
|
||||
|
||||
This PEP provides the motivation and rationale for PEP 634
|
||||
("Structural Pattern Matching: Specification"). First-time readers
|
||||
are encouraged to start with PEP 636, which provides a gentler
|
||||
introduction to the concepts, syntax and semantics of patterns.
|
||||
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
(Structural) pattern matching syntax is found in many languages, from
|
||||
Haskell, Erlang and Scala to Elixir and Ruby. (A proposal for
|
||||
JavaScript is also under consideration.)
|
||||
|
||||
Python already supports a limited form of this through sequence
|
||||
unpacking assignments, which the new proposal leverages.
|
||||
|
||||
Several other common Python idioms are also relevant:
|
||||
|
||||
- The ``if ... elif ... elif ... else`` idiom is often used to find
|
||||
out the type or shape of an object in an ad-hoc fashion, using one
|
||||
or more checks like ``isinstance(x, cls)``, ``hasattr(x, "attr")``,
|
||||
``len(x) == n`` or ``"key" in x`` as guards to select an applicable
|
||||
block. The block can then assume ``x`` supports the interface
|
||||
checked by the guard. For example::
|
||||
|
||||
if isinstance(x, tuple) and len(x) == 2:
|
||||
host, port = x
|
||||
mode = "http"
|
||||
elif isinstance(x, tuple) and len(x) == 3:
|
||||
host, port, mode = x
|
||||
# Etc.
|
||||
|
||||
Code like this is more elegantly rendered using ``match``::
|
||||
|
||||
match x:
|
||||
case host, port:
|
||||
mode = "http"
|
||||
case host, port, mode:
|
||||
pass
|
||||
# Etc.
|
||||
|
||||
- AST traversal code often looks for nodes matching a given pattern,
|
||||
for example the code to detect a node of the shape "A + B * C" might
|
||||
look like this::
|
||||
|
||||
if (isinstance(node, BinOp) and node.op == "+"
|
||||
and isinstance(node.right, BinOp) and node.right.op == "*"):
|
||||
a, b, c = node.left, node.right.left, node.right.right
|
||||
# Handle a + b*c
|
||||
|
||||
Using ``match`` this becomes more readable::
|
||||
|
||||
match node:
|
||||
case BinOp("+", a, BinOp("*", b, c)):
|
||||
# Handle a + b*c
|
||||
|
||||
- TODO: Other compelling examples?
|
||||
|
||||
We believe that adding pattern matching to Python will enable Python
|
||||
users to write cleaner, more readable code for examples like those
|
||||
above, and many others.
|
||||
|
||||
Pattern matching and OO
|
||||
-----------------------
|
||||
|
||||
Pattern matching is complimentary to the object-oriented paradigm.
|
||||
Using OO and inheritance we can easily define a method on a base class
|
||||
that defines default behavior for a specific operation on that class,
|
||||
and we can override this default behavior in subclasses. We can also
|
||||
use the Visitor pattern to separate actions from data.
|
||||
|
||||
But this is not sufficient for all situations. For example, a code
|
||||
generator may consume an AST, and have many operations where the
|
||||
generated code needs to vary based not just on the class of a node,
|
||||
but also on the value of some class attributes, like the ``BinOp``
|
||||
example above. The Visitor pattern is insufficiently flexible for
|
||||
this: it can only select based on the class.
|
||||
|
||||
For a complete example, see
|
||||
https://github.com/gvanrossum/patma/blob/master/examples/expr.py#L231
|
||||
|
||||
TODO: Could we say more here?
|
||||
|
||||
Pattern and functional style
|
||||
----------------------------
|
||||
|
||||
Most Python applications and libraries are not written in a consistent
|
||||
OO style -- unlike Java, Python encourages defining functions at the
|
||||
top-level of a module, and for simple data structures, tuples (or
|
||||
named tuples or lists) and dictionaries are often used exclusively or
|
||||
mixed with classes or data classes.
|
||||
|
||||
Pattern matching is particularly suitable for picking apart such data
|
||||
structures. As an extreme example, it's easy to write code that picks
|
||||
a JSON data structure using ``match``.
|
||||
|
||||
TODO: Example code.
|
||||
|
||||
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
TBD.
|
||||
|
||||
This section should provide the rationale for individual design decisions.
|
||||
It takes the place of "Rejected ideas" in the standard PEP format.
|
||||
It is organized in sections corresponding to the specification (PEP 634).
|
||||
|
||||
|
||||
Overview and terminology
|
||||
------------------------
|
||||
|
||||
|
||||
|
||||
The ``match`` statement
|
||||
-----------------------
|
||||
|
||||
The match statement evaluates an expression to produce a subject, finds the
|
||||
first pattern that matches the subject and executes the associated block
|
||||
of code. Syntactically, the match statement thus takes an expression and
|
||||
a sequence of case clauses, where each case clause comprises a pattern and
|
||||
a block of code.
|
||||
|
||||
Since case clauses comprise a block of code, they adhere to the existing
|
||||
indentation scheme with the syntactic structure of
|
||||
``<keyword> ...: <(indented) block>``, which in turn makes it a (compound)
|
||||
statement. The chosen keyword ``case`` reflects its widespread use in
|
||||
pattern matching languages, ignoring those languages that use other
|
||||
syntactic means such as a symbol like ``|`` because it would not fit
|
||||
established Python structures. The syntax of patterns following the
|
||||
keyword is discussed below.
|
||||
|
||||
Given that the case clauses follow the structure of a compound statement,
|
||||
the match statement itself naturally becomes a compoung statement itself
|
||||
as well, following the same syntactic structure. This naturally leads to
|
||||
``match <expr>: <case_clause>+``. Note that the match statement determines
|
||||
a quasi-scope in which the evaluated subject is kept alive (although not in
|
||||
a local variable), similar to how a with statement might keep a resource
|
||||
alive during execution of its block. Furthermore, control flows from the
|
||||
match statement to a case clause and then leaves the block of the match
|
||||
statement. The block of the match statement thus has both syntactic and
|
||||
semantic meaning.
|
||||
|
||||
Various suggestions have sought to eliminate or avoid the naturally arising
|
||||
"double indentation" of a case clause's code block. Unfortunately, all such
|
||||
proposals of *flat indentation schemes* come at the expense of violating
|
||||
Python's establish structural paradigm, leading to additional syntactic
|
||||
rules:
|
||||
|
||||
- *Unindented case clauses.*
|
||||
The idea is to align case clauses with the ``match``, i.e.::
|
||||
|
||||
match expression:
|
||||
case pattern_1:
|
||||
...
|
||||
case pattern_2:
|
||||
...
|
||||
|
||||
This may look awkward to the eye of a Python programmer, because
|
||||
everywhere else colon is followed by an indent. The ``match`` would
|
||||
neither follow the syntactic scheme of simple nor composite statements
|
||||
but rather establish a category of its own.
|
||||
|
||||
- *Putting the expression on a separate line after ``match``.*
|
||||
The idea is to use the expression yielding the subject as a statement
|
||||
to avoid the singularity of ``match`` having no actual block despite
|
||||
the colons::
|
||||
|
||||
match:
|
||||
expression
|
||||
case pattern_1:
|
||||
...
|
||||
case pattern_2:
|
||||
...
|
||||
|
||||
This was ultimately rejected because the first block would be another
|
||||
novelty in Python's grammar: a block whose only content is a single
|
||||
expression rather than a sequence of statements. Attempts to amend this
|
||||
issue by adding or repurposing yet another keyword along the lines of
|
||||
``match: return expression`` did not yield any satisfactory solution.
|
||||
|
||||
Although flat indentation would save some horizontal space, the cost of
|
||||
increased complexity or unusual rules is too high. It would also complicate
|
||||
life for simple-minded code editors. Finally, the horizontal space issue can
|
||||
be alleviated by allowing "half-indent" (i.e. two spaces instead of four)
|
||||
for match statements.
|
||||
|
||||
In sample programs using match, written as part of the development of this
|
||||
PEP, a noticeable improvement in code brevity is observed, more than making
|
||||
up for the additional indentation level.
|
||||
|
||||
|
||||
*Statement v Expression.* Some suggestions centered around the idea of
|
||||
making ``match`` an expression rather than a statement. However, this
|
||||
would fit poorly with Python's statement-oriented nature and lead to
|
||||
unusually long and complex expressions with the need to invent new
|
||||
syntactic constructs or break well established syntactic rules. An
|
||||
obvious consequence of ``match`` as an expression would be that case
|
||||
clauses could no longer have abitrary blocks of code attached, but only
|
||||
a single expression. Overall, the strong limitations could in no way
|
||||
offset the slight simplification in some special use cases.
|
||||
|
||||
|
||||
|
||||
Match semantics
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
The patterns of different case clauses might overlap in that more than
|
||||
one case clause would match a given subject. The first-to-match rule
|
||||
ensures that the selection of a case clause for a given subject is
|
||||
unambiguous. Furthermore, case clauses can have increasingly general
|
||||
patterns matching wider classes of subjects. The first-to-match rule
|
||||
then ensures that the most precise pattern can be chosen (although it
|
||||
is the programmer's responsibility to order the case clauses correctly).
|
||||
|
||||
In a statically typed language, the match statement would be compiled to
|
||||
a decision tree to select a matching pattern quickly and very efficiently.
|
||||
This would, however, require that all patterns be purely declarative and
|
||||
static, running against the established dynamic semantics of Python. The
|
||||
proposed semantics thus represent a path incorporating the best of both
|
||||
worlds: patterns are tried in a strictly sequential order so that each
|
||||
case clause constitutes an actual stement. At the same time, we allow
|
||||
the interpreter to cache any information about the subject or change the
|
||||
order in which subpatterns are tried. In other words: if the interpreter
|
||||
has found that the subject is not an instance of a class ``C``, it can
|
||||
directly skip case clauses testing for this again, without having to
|
||||
perform repeated instance-checks. If a guard stipulates that a variable
|
||||
``x`` must be positive, say (i.e. ``if x > 0``), the interpreter might
|
||||
check this directly after binding ``x`` and before any further
|
||||
subpatterns are considered.
|
||||
|
||||
|
||||
*Binding and scoping.* In many pattern matching implementations, each
|
||||
case clause would establish a separate scope of its own. Variables bound
|
||||
by a pattern would then only be visible inside the corresponding case block.
|
||||
In Python, however, this does not make sense. Establishing separate scopes
|
||||
would essentially mean that each case clause is a separate function without
|
||||
direct access to the variables in the surrounding scope (without having to
|
||||
resort to ``nonlocal`` that is). Moreover, a case clause could no longer
|
||||
influence any surrounding control flow through standard statement such as
|
||||
``return`` or ``break``. Hence, such script scoping would lead to
|
||||
unintuitive and surprising behavior.
|
||||
|
||||
A direct consequence of this is that any variable bindings outlive the
|
||||
respective case or match statements. Even patterns that only match a
|
||||
subject partially might bind local variables (this is, in fact, necessary
|
||||
for guards to function properly). However, this escaping of variable
|
||||
bindings is in line with existing Python structures such as for loops and
|
||||
with statements.
|
||||
|
||||
|
||||
.. _patterns:
|
||||
|
||||
Patterns
|
||||
--------
|
||||
|
||||
Patterns fulfill two purposes: they impose (structural) constraints on
|
||||
the subject and they specify which data values should be extracted from
|
||||
the subject and bound to variables. In iterable unpacking, which can be
|
||||
seen as a prototype to pattern matching in Python, there is only one
|
||||
*structural pattern* to express sequences while there is a rich set of
|
||||
*binding patterns* to assign a value to a specific variable or field.
|
||||
Full pattern matching differs from this in that there is more variety
|
||||
in structual patterns but only a minimum of binding patterns.
|
||||
|
||||
Patterns differ from assignment targets (as in iterable unpacking) in that
|
||||
they impose additional constraints on the structure of the subject and in
|
||||
that a subject might safely fail to match a specific pattern at any point
|
||||
(in iterable unpacking, this constitutes an error). The latter means that
|
||||
pattern should avoid side effects wherever possible, including binding
|
||||
values to attributes or subscripts.
|
||||
|
||||
A cornerstone of pattern matching is the possibility of arbitrarily
|
||||
*nesting patterns*. The nesting allows for expressing deep
|
||||
tree structures (for an example of nested class patterns, see the motivation
|
||||
section above) as well as alternatives.
|
||||
|
||||
Although the structural patterns might superficially look like expressions,
|
||||
it is important to keep in mind that there is a clear distinction. In fact,
|
||||
no pattern is or contains an expression. It is more productive to think of
|
||||
patterns as declarative elements similar to the formal parameters in a
|
||||
function definition.
|
||||
|
||||
|
||||
Walrus patterns
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
|
||||
OR patterns
|
||||
~~~~~~~~~~~
|
||||
|
||||
The OR pattern allows you to combine 'structurally equivalent' alternatives
|
||||
into a new pattern, i.e. several patterns can share a common handler. If any
|
||||
one of an OR pattern's subpatterns matches the given subject, the entire OR
|
||||
pattern succeeds.
|
||||
|
||||
Statically typed languages prohibit the binding of names (capture patterns)
|
||||
inside an OR pattern because of potential conflicts concerning the types of
|
||||
variables. As a dynamically typed language, Python can be less restrictive
|
||||
here and allow capture patterns inside OR patterns. However, each subpattern
|
||||
must bind the same set of variables so as not to leave potentially undefined
|
||||
names. With two alternatives ``P | Q``, this means that if *P* binds the
|
||||
variables *u* and *v*, *Q* must bind exactly the same variables *u* and *v*.
|
||||
|
||||
There was some discussion on whether to use the bar ``|`` or the keyword
|
||||
``or`` in order to separate alternatives. The OR pattern does not fully fit
|
||||
the existing semantics and usage of either of these two symbols. However,
|
||||
``|`` is the symbol of choice in all programming languages with support of
|
||||
the OR pattern and is even used in that capacity for regular expressions in
|
||||
Python as well. Moreover, ``|`` is not only used for bitwise OR, but also
|
||||
for set unions and dict merging (:pep:`584`).
|
||||
Other alternatives were considered as well, but none of these would allow
|
||||
OR-patterns to be nested inside other patterns:
|
||||
|
||||
- *Using a comma*::
|
||||
|
||||
case 401, 403, 404:
|
||||
print("Some HTTP error")
|
||||
|
||||
This looks too much like a tuple -- we would have to find a different way
|
||||
to spell tuples, and the construct would have to be parenthesized inside
|
||||
the argument list of a class pattern. In general, commas already have many
|
||||
different meanings in Python, we shouldn't add more.
|
||||
|
||||
- *Using stacked cases*::
|
||||
|
||||
case 401:
|
||||
case 403:
|
||||
case 404:
|
||||
print("Some HTTP error")
|
||||
|
||||
This is how this would be done in *C*, using its fall-through semantics
|
||||
for cases. However, we don't want to mislead people into thinking that
|
||||
match/case uses fall-through semantics (which are a common source of bugs
|
||||
in *C*). Also, this would be a novel indentation pattern, which might make
|
||||
it harder to support in IDEs and such (it would break the simple rule "add
|
||||
an indentation level after a line ending in a colon"). Finally, this
|
||||
would not support OR patterns nested inside other patterns.
|
||||
|
||||
- *Using ``case in`` followed by a comma-separated list*::
|
||||
|
||||
case in 401, 403, 404:
|
||||
print("Some HTTP error")
|
||||
|
||||
This would not work for OR patterns nested inside other patterns, like::
|
||||
|
||||
case Point(0|1, 0|1):
|
||||
print("A corner of the unit square")
|
||||
|
||||
|
||||
*AND and NOT patterns.*
|
||||
This proposal defines an OR-pattern (|) to match one of several alternates;
|
||||
why not also an AND-pattern (``&``) or even a NOT-pattern (``!``)?
|
||||
Especially given that some other languages (``F#`` for example) support
|
||||
AND-patterns.
|
||||
|
||||
However, it is not clear how useful this would be. The semantics for matching
|
||||
dictionaries, objects and sequences already incorporates an implicit 'and':
|
||||
all attributes and elements mentioned must be present for the match to
|
||||
succeed. Guard conditions can also support many of the use cases that a
|
||||
hypothetical 'and' operator would be used for.
|
||||
|
||||
A negation of a match pattern using the operator ``!`` as a prefix would match
|
||||
exactly if the pattern itself does not match. For instance, ``!(3 | 4)``
|
||||
would match anything except ``3`` or ``4``. However, there is evidence from
|
||||
other languages that this is rarely useful and primarily used as double
|
||||
negation ``!!`` to control variable scopes and prevent variable bindings
|
||||
(which does not apply to Python).
|
||||
|
||||
In the end, it was decided that this would make the syntax more complex
|
||||
without adding a significant benefit.
|
||||
|
||||
|
||||
Example::
|
||||
|
||||
def simplify(expr):
|
||||
match expr:
|
||||
case ('/', 0, 0):
|
||||
return expr
|
||||
case ('*' | '/', 0, _):
|
||||
return 0
|
||||
case ('+' | '-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*' | '/', x, 1):
|
||||
return x
|
||||
return expr
|
||||
|
||||
|
||||
.. _capture_pattern:
|
||||
|
||||
Capture Patterns
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
Capture patterns take on the form of a name that accepts any value and binds
|
||||
it to a (local) variable (unless the name is declared as ``nonlocal`` or
|
||||
``global``). In that sense, a simple capture pattern is basically equivalent
|
||||
to a parameter in a function definition (when the function is called, each
|
||||
parameter binds the respective argument to a local variable in the function's
|
||||
scope).
|
||||
|
||||
A name used for a capture pattern must not coincide with another capture
|
||||
pattern in the same pattern. This, again, is similar to parameters, which
|
||||
equally require each parameter name to be unique within the list of
|
||||
parameters. It differs, however, from iterable unpacking assignment, where
|
||||
the repeated use of a variable name as target is permissible (e.g.,
|
||||
``x, x = 1, 2``). The rationale for not supporting ``(x, x)`` in patterns
|
||||
is its ambiguous reading: it could be seen as in iterable unpacking where
|
||||
only the second binding to ``x`` survives. But it could be equally seen as
|
||||
expressing a tuple with two equal elements (which comes with its own issues).
|
||||
Should the need arise, then it is still possible to introduce support for
|
||||
repeated use of names later on.
|
||||
|
||||
There were calls to explicitly mark capture patterns and thus identify them
|
||||
as binding targets. According to that idea, a capture pattern would be
|
||||
written as, e.g. ``?x`` or ``$x``. The aim of such explicit capture markers
|
||||
is to let an unmarked name be a constant value pattern (see below). However,
|
||||
this is based on the misconception that pattern matching was an extension of
|
||||
*switch* statements, placing the emphasis on fast switching based on
|
||||
(ordinal) values. Such a *switch* statement has indeed been proposed for
|
||||
Python before (see :pep:`275` and :pep:`3103`). Pattern matching, on the other
|
||||
hand, builds a generalized concept of iterable unpacking. Binding values
|
||||
extracted from a data structure is at the very core of the concept and hence
|
||||
the most common use case. Explicit markers for capture patterns would thus
|
||||
betray the objective of the proposed pattern matching syntax and simplify
|
||||
a secondary use case at the expense of additional syntactic clutter for
|
||||
core cases.
|
||||
|
||||
Example::
|
||||
|
||||
def average(*args):
|
||||
match args:
|
||||
case [x, y]: # captures the two elements of a sequence
|
||||
return (x + y) / 2
|
||||
case [x]: # captures the only element of a sequence
|
||||
return x
|
||||
case []:
|
||||
return 0
|
||||
case x: # captures the entire sequence
|
||||
return sum(x) / len(x)
|
||||
|
||||
|
||||
.. _wildcard_pattern:
|
||||
|
||||
Wildcard Pattern
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The wildcard pattern is a special case of a 'capture' pattern: it accepts
|
||||
any value, but does not bind it to a variable. The idea behind this rule
|
||||
is to support repeated use of the wildcard in patterns. While ``(x, x)``
|
||||
is an error, ``(_, _)`` is legal.
|
||||
|
||||
Particularly in larger (sequence) patterns, it is important to allow the
|
||||
pattern to concentrate on values with actual significance while ignoring
|
||||
anything else. Without a wildcard, it would become necessary to 'invent'
|
||||
a number of local variables, which would be bound but never used. Even
|
||||
when sticking to naming conventions and using e.g. ``_1, _2, _3`` to name
|
||||
irrelevant values, say, this still introduces visual clutter and can hurt
|
||||
performance (compare the sequence pattern ``(x, y, *z)`` to ``(_, y, *_)``,
|
||||
where the ``*z`` forces the interpreter to copy a potentially very long
|
||||
sequence, whereas the second version simply compiles to code along the
|
||||
lines of ``y = seq[1]``).
|
||||
|
||||
There has been much discussion about the choice of the underscore as ``_``
|
||||
as a wildcard pattern, i.e. making this one name non-binding. However, the
|
||||
underscore is already heavily used as an 'ignore value' marker in iterable
|
||||
unpacking. Since the wildcard pattern ``_`` never binds, this use of the
|
||||
underscore does not interfere with other uses such as inside the REPL or
|
||||
the ``gettext`` module.
|
||||
|
||||
It has been proposed to use ``...`` (i.e., the ellipsis token) or ``*``
|
||||
(star) as a wildcard. However, both these look as if an arbitrary number
|
||||
of items is omitted::
|
||||
|
||||
case [a, ..., z]: ...
|
||||
case [a, *, z]: ...
|
||||
|
||||
Both look like the would match a sequence of at two or more items,
|
||||
capturing the first and last values.
|
||||
|
||||
A single wildcard clause (i.e. ``case _:``) is semantically equivalent to
|
||||
an ``else:``. It accepts any subject without binding it to a variable or
|
||||
performing any other operation. However, the wildcard pattern is in
|
||||
contrast to ``else`` usable as a subpattern in nested patterns.
|
||||
|
||||
Finally note that the underscore is as a wildcard pattern in *every*
|
||||
programming language with pattern matching that we could find
|
||||
(including *C#*, *Elixir*, *Erlang*, *F#*, *Grace*, *Haskell*,
|
||||
*Mathematica*, *OCaml*, *Ruby*, *Rust*, *Scala*, *Swift*, and *Thorn*).
|
||||
Keeping in mind that many users of Python also work with other programming
|
||||
languages, have prior experience when learning Python, or moving on to
|
||||
other languages after having learnt Python, we find that such well
|
||||
established standards are important and relevant with respect to
|
||||
readability and learnability. In our view, concerns that this wildcard
|
||||
means that a regular name received special treatment are not strong
|
||||
enough to introduce syntax that would make Python special.
|
||||
|
||||
Example::
|
||||
|
||||
def is_closed(sequence):
|
||||
match sequence:
|
||||
case [_]: # any sequence with a single element
|
||||
return True
|
||||
case [start, *_, end]: # a sequence with at least two elements
|
||||
return start == end
|
||||
case _: # anything
|
||||
return False
|
||||
|
||||
|
||||
.. _literal_pattern:
|
||||
|
||||
Literal Patterns
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
Literal patterns are a convenient way for imposing constraints on the
|
||||
value of a subject, rather than its type or structure. Literal patterns
|
||||
even allow you to emulate a switch statement using pattern matching.
|
||||
|
||||
Generally, the subject is compared to a literal pattern by means of standard
|
||||
equality (``x == y`` in Python syntax). Consequently, the literal patterns
|
||||
``1.0`` and ``1`` match exactly the same set of objects, i.e. ``case 1.0:``
|
||||
and ``case 1:`` are fully interchangable. In principle, ``True`` would also
|
||||
match the same set of objects because ``True == 1`` holds. However, we
|
||||
believe that many users would be surprised finding that ``case True:``
|
||||
matched the object ``1.0``, resulting in some subtle bugs and convoluted
|
||||
workarounds. We therefore adopted the rule that the three singleton
|
||||
objects ``None``, ``False`` and ``True`` match by identity (``x is y`` in
|
||||
Python syntax) rather than equality. Hence, ``case True:`` will match only
|
||||
``True`` and nothing else. Note that ``case 1:`` would still match ``True``,
|
||||
though, because the literal pattern ``1`` works by equality and not identity.
|
||||
|
||||
Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would
|
||||
match both the integer ``1`` and the floating point number ``1.0``, whereas
|
||||
``case 1:`` would only match the integer ``1`` were eventually dropped in
|
||||
favor of the simpler and consistent rule based on equality. Moreover, any
|
||||
additional checks whether the subject is an instance of ``numbers.Integral``
|
||||
would come at a high runtime cost to introduce what would essentially be
|
||||
novel in Python. When needed, the explicit syntax ``case int(1):`` might
|
||||
be used.
|
||||
|
||||
Recall that literal patterns are *not* expressions, but directly denote a
|
||||
specific value or object. From a syntactical point of view, we have to
|
||||
ensure that negative and complex numbers can equally be used as patterns,
|
||||
although they are not atomic literal values (i.e. the seeming literal value
|
||||
``-3+4j`` would syntactically be an expression of the form
|
||||
``BinOp(UnaryOp('-', 3), '+', 4j)``, but as expressions are not part of
|
||||
patterns, we added syntactic support for such complex value literals without
|
||||
having to resort to full expressions). Interpolated *f*-strings, on the
|
||||
other hand, are not literal values, despite their appearance and can
|
||||
therefore not be used as literal patterns (string concatenation, however,
|
||||
is supported).
|
||||
|
||||
Literal patterns not only occur as patterns in their own right, but also
|
||||
as keys in *mapping patterns*.
|
||||
|
||||
Example::
|
||||
|
||||
def simplify(expr):
|
||||
match expr:
|
||||
case ('+', 0, x):
|
||||
return x
|
||||
case ('+' | '-', x, 0):
|
||||
return x
|
||||
case ('and', True, x):
|
||||
return x
|
||||
case ('and', False, x):
|
||||
return False
|
||||
case ('or', False, x):
|
||||
return x
|
||||
case ('or', True, x):
|
||||
return True
|
||||
case ('not', ('not', x)):
|
||||
return x
|
||||
return expr
|
||||
|
||||
|
||||
.. _constant_value_pattern:
|
||||
|
||||
Constant Value Patterns
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
It is good programming style to use named constants for parametric values or
|
||||
to clarify the meaning of particular values. Clearly, it would be desirable
|
||||
to also write ``case (HttpStatus.OK, body):`` rather than
|
||||
``case (200, body):``, for example. The main issue that arises here is how to
|
||||
distinguish capture patterns (variables) from constant value patterns. The
|
||||
general discussion surrounding this issue has brought forward a plethora of
|
||||
options, which we cannot all fully list here.
|
||||
|
||||
Strictly speaking, constant value patterns are not really necessary, but
|
||||
could be implemented using guards, i.e.
|
||||
``case (status, body) if status == HttpStatus.OK:``. Nonetheless, the
|
||||
convenience of constant value patterns is unquestioned and obvious.
|
||||
|
||||
The observation that constants tend to be written in uppercase letters or
|
||||
collected in enumeration-like namespaces suggests possible rules to discern
|
||||
constants syntactically. However, the idea of using upper vs. lower case as
|
||||
a marker has been met with scepticism since there is no similar precedence
|
||||
in core Python (although it is common in other languages). We therefore only
|
||||
adopted the rule that any dotted name (i.e. attribute access) is to be
|
||||
interpreted as a constant value pattern like ``HttpStatus.OK``
|
||||
above. This precludes, in particular, local variables from acting as
|
||||
constants.
|
||||
|
||||
Global variables can only be directly used as constant when defined in other
|
||||
modules, although there are workarounds to access the current module as a
|
||||
namespace as well. A proposed rule to use a leading dot (e.g.
|
||||
``.CONSTANT``) for that purpose was critisised because it was felt that the
|
||||
dot would not be a visible-enough marker for that purpose. Partly inspired
|
||||
by use cases in other programming languages, a number of different
|
||||
markers/sigils were proposed (such as ``^CONSTANT``, ``$CONSTANT``,
|
||||
``==CONSTANT``, ``CONSTANT?``, or the word enclosed in backticks), although
|
||||
there was no obvious or natural choice. The current proposal therefore
|
||||
leaves the discussion and possible introduction of such a 'constant' marker
|
||||
for future PEPs.
|
||||
|
||||
Distinguishing the semantics of names based on whether it is a global
|
||||
variable (i.e. the compiler would treat global variables as constants rather
|
||||
than capture patterns) leads to various issues. The addition or alteration
|
||||
of a global variable in the module could have unintended side effects on
|
||||
patterns. Moreover, pattern matching could not be used directly inside a
|
||||
module's scope because all variables would be global, making capture
|
||||
patterns impossible.
|
||||
|
||||
Example::
|
||||
|
||||
def handle_reply(reply):
|
||||
match reply:
|
||||
case (HttpStatus.OK, MimeType.TEXT, body):
|
||||
process_text(body)
|
||||
case (HttpStatus.OK, MimeType.APPL_ZIP, body):
|
||||
text = deflate(body)
|
||||
process_text(text)
|
||||
case (HttpStatus.MOVED_PERMANENTLY, new_URI):
|
||||
resend_request(new_URI)
|
||||
case (HttpStatus.NOT_FOUND):
|
||||
raise ResourceNotFound()
|
||||
|
||||
|
||||
Group Patterns
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
Allowing users to explicitly specify the grouping is particularly helpful
|
||||
in case of OR patterns.
|
||||
|
||||
|
||||
.. _sequence_pattern:
|
||||
|
||||
Sequence Patterns
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
Sequence patterns follow as closely as possible the already established
|
||||
syntax and semantics of iterable unpacking. Of course, subpatterns take
|
||||
the place of assignment targets (variables, attributes and subscript).
|
||||
Moreover, the sequence pattern only matches a carefully selected set of
|
||||
possible subjects, whereas iterable unpacking can be applied to any
|
||||
iterable.
|
||||
|
||||
- As in iterable unpacking, we do not distinguish between 'tuple' and
|
||||
'list' notation. ``[a, b, c]``, ``(a, b, c)`` and ``a, b, c`` are all
|
||||
equivalent. While this means we have a redundant notation and checking
|
||||
specifically for lists or tuples requires more effort (e.g.
|
||||
``case list([a, b, c])``), we mimick iterable unpacking as much as
|
||||
possible.
|
||||
|
||||
- A starred pattern will capture a sub-sequence of arbitrary length,
|
||||
mirroring iterable unpacking as well. Only one starred item may be
|
||||
present in any sequence pattern. In theory, patterns such as ``(*_, 3, *_)``
|
||||
could be understood as expressing any sequence containing the value ``3``.
|
||||
In practise, however, this would only work for a very narrow set of use
|
||||
cases and lead to inefficient backtracking or even ambiguities otherwise.
|
||||
|
||||
- The sequence pattern does *not* iterate through an iterable subject. All
|
||||
elements are accessed through subscripting and slicing, and the subject must
|
||||
be an instance of ``collections.abc.Sequence`` (including, in particular,
|
||||
lists and tuples, but excluding strings and bytes, as well as sets and
|
||||
dictionaries).
|
||||
|
||||
A sequence pattern cannot just iterate through any iterable object. The
|
||||
consumption of elements from the iteration would have to be undone if the
|
||||
overall pattern fails, which is not possible.
|
||||
|
||||
Relying on ``len()`` and subscripting and slicing alone does not work to
|
||||
identify sequences because sequences share the protocol with more general
|
||||
maps (dictionaries) in this regard. It would be surprising if a sequence
|
||||
pattern also matched dictionaries or other custom objects that implement
|
||||
the mapping protocol (i.e. ``__getitem__``). The interpreter therefore
|
||||
performs an instance check to ensure that the subject in question really
|
||||
is a sequence (of known type).
|
||||
|
||||
String and bytes objects have a dual nature: they are both 'atomic' objects
|
||||
in their own right, as well as sequences (with a strongly recursive nature
|
||||
in that a string is a sequence of strings). The typical behavior and use
|
||||
cases for strings and bytes are different enough from that of tuples and
|
||||
lists to warrant a clear distinction. It is in fact often unintuitive and
|
||||
unintended that strings pass for sequences as evidenced by regular questions
|
||||
and complaints. Strings and bytes are therefore not matched by a sequence
|
||||
pattern, limiting the sequence pattern to a very specific understanding of
|
||||
'sequence'.
|
||||
|
||||
|
||||
.. _mapping_pattern:
|
||||
|
||||
Mapping Patterns
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
Dictionaries or mappings in general are one of the most important and most
|
||||
widely used data structures in Python. In contrast to sequences mappings
|
||||
are built for fast direct access to arbitrary elements (identified by a key).
|
||||
In most use cases an element is retrieved from a dictionary by a known key
|
||||
without regard for any ordering or other key-value pairs stored in the same
|
||||
dictionary. Particularly common are string keys.
|
||||
|
||||
The mapping pattern reflects the common usage of dictionary lookup: it allows
|
||||
the user to extract some values from a mapping by means of constant/known
|
||||
keys and have the values match given subpatterns. Moreover, the mapping
|
||||
pattern does not check for the presence of additional keys. Should it be
|
||||
necessary to impose an upper bound on the mapping and ensure that no
|
||||
additional keys are present, then the usual double-star-pattern ``**rest``
|
||||
can be used. The special case ``**_`` with a wildcard, however, is not
|
||||
supported as it would not have any effect, but might lead to a wrong
|
||||
understanding of the mapping pattern's semantics.
|
||||
|
||||
To avoid overly expensive matching algorithms, keys must be literals or
|
||||
constant values.
|
||||
|
||||
Example::
|
||||
|
||||
def change_red_to_blue(json_obj):
|
||||
match json_obj:
|
||||
case { 'color': ('red' | '#FF0000') }:
|
||||
json_obj['color'] = 'blue'
|
||||
case { 'children': children }:
|
||||
for child in children:
|
||||
change_red_to_blue(child)
|
||||
|
||||
|
||||
.. _class_pattern:
|
||||
|
||||
Class Patterns
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
Class patterns fulfil two purposes: checking whether a given subject is
|
||||
indeed an instance of a specific class and extracting data from specific
|
||||
attributes of the subject. A quick survey revealed that ``isinstance()``
|
||||
is indeed one of the most often used functions in Python in terms of
|
||||
static occurrences in programs. Such instance checks typically precede
|
||||
a subsequent access to information stored in the object, or a possible
|
||||
manipulation thereof. A typical pattern might be along the lines of::
|
||||
|
||||
def traverse_tree(node):
|
||||
if isinstance(node, Node):
|
||||
traverse_tree(node.left)
|
||||
traverse_tree(node.right)
|
||||
elif isinstance(node, Leaf):
|
||||
print(node.value)
|
||||
|
||||
In many cases, however, class patterns occur nested as in the example
|
||||
given in the motivation::
|
||||
|
||||
if (isinstance(node, BinOp) and node.op == "+"
|
||||
and isinstance(node.right, BinOp) and node.right.op == "*"):
|
||||
a, b, c = node.left, node.right.left, node.right.right
|
||||
# Handle a + b*c
|
||||
|
||||
The class pattern lets you to concisely specify both an instance-check as
|
||||
well as relevant attributes (with possible further constraints). It is
|
||||
thereby very tempting to write, e.g., ``case Node(left, right):`` in the
|
||||
first case above and ``case Leaf(value):`` in the second. While this
|
||||
indeed works well for languages with strict algebraic data types, it is
|
||||
problematic with the structure of Python objects.
|
||||
|
||||
When dealing with general Python objects, we face a potentially very large
|
||||
number of unordered attributes: an instance of ``Node`` contains a large
|
||||
number of attributes (most of which are 'private methods' such as, e.g.,
|
||||
``__repr__``). Moreover, the interpreter cannot reliably deduce which of
|
||||
the attributes comes first and which comes second. For an object that
|
||||
represents a circle, say, there is no inherently obvious ordering of the
|
||||
attributes ``x``, ``y`` and ``radius``.
|
||||
|
||||
We envision two possibilities for dealing with this issue: either explicitly
|
||||
name the attributes of interest or provide an additional mapping that tells
|
||||
the interpreter which attributes to extract and in which order. Both
|
||||
approaches are supported. Moreover, explicitly naming the attributes of
|
||||
interest lets you further specify the required structure of an object; if
|
||||
an object lacks an attribute specified by the pattern, the match fails.
|
||||
|
||||
- Attributes that are explicitly named pick up the syntax of named arguments.
|
||||
If an object of class ``Node`` has two attributes ``left`` and ``right``
|
||||
as above, the pattern ``Node(left=x, right=y)`` will extract the values of
|
||||
both attributes and assign them to ``x`` and ``y``, respectively. The data
|
||||
flow from left to right seems unusual, but is in line with mapping patterns
|
||||
and has precedents such as assignments via ``as`` in *with*- or
|
||||
*import*-statements.
|
||||
|
||||
Naming the attributes in question explicitly will be mostly used for more
|
||||
complex cases where the positional form (below) is insufficient.
|
||||
|
||||
- The class field ``__match_args__`` specifies a number of attributes
|
||||
together with their ordering, allowing class patterns to rely on positional
|
||||
sub-patterns without having to explicitly name the attributes in question.
|
||||
This is particularly handy for smaller objects or instances of data classes,
|
||||
where the attributes of interest are rather obvious and often have a
|
||||
well-defined ordering. In a way, ``__match_args__`` is similar to the
|
||||
declaration of formal parameters, which allows to call functions with
|
||||
positional arguments rather than naming all the parameters.
|
||||
|
||||
|
||||
The syntax of class patterns is based on the idea that de-construction
|
||||
mirrors the syntax of construction. This is already the case in virtually
|
||||
any Python construct, be assignment targets, function definitions or
|
||||
iterable unpacking. In all these cases, we find that the syntax for
|
||||
sending and that for receiving 'data' are virtually identical.
|
||||
|
||||
- Assignment targets such as variables, attributes and subscripts:
|
||||
``foo.bar[2] = foo.bar[3]``;
|
||||
|
||||
- Function definitions: a function defined with ``def foo(x, y, z=6)``
|
||||
is called as, e.g., ``foo(123, y=45)``, where the actual arguments
|
||||
provided at the call site are matched against the formal parameters
|
||||
at the definition site;
|
||||
|
||||
- Iterable unpacking: ``a, b = b, a`` or ``[a, b] = [b, a]`` or
|
||||
``(a, b) = (b, a)``, just to name a few equivalent possibilities.
|
||||
|
||||
Using the same syntax for reading and writing, l- and r-values, or
|
||||
construction and de-construction is widely accepted for its benefits in
|
||||
thinking about data, its flow and manipulation. This equally extends to
|
||||
the explicit construction of instances, where class patterns ``c(p, q)``
|
||||
deliberately mirror the syntax of creating instances.
|
||||
|
||||
|
||||
|
||||
History and Context
|
||||
===================
|
||||
|
||||
Pattern matching emerged in the late 1970s in the form of tuple unpacking
|
||||
and as a means to handle recursive data structures such as linked lists or
|
||||
trees (object-oriented languages usually use the visitor pattern for handling
|
||||
recursive data structures). The early proponents of pattern matching
|
||||
organised structured data in 'tagged tuples' rather than ``struct`` as in
|
||||
*C* or the objects introduced later. A node in a binary tree would, for
|
||||
instance, be a tuple with two elements for the left and right branches,
|
||||
respectively, and a ``Node`` tag, written as ``Node(left, right)``. In
|
||||
Python we would probably put the tag inside the tuple as
|
||||
``('Node', left, right)`` or define a data class `Node` to achieve the
|
||||
same effect.
|
||||
|
||||
Using modern syntax, a depth-first tree traversal would then be written as
|
||||
follows::
|
||||
|
||||
def traverse_tree(node):
|
||||
node match:
|
||||
case Node(left, right):
|
||||
DFS(left)
|
||||
DFS(right)
|
||||
case Leaf(value):
|
||||
handle(value)
|
||||
|
||||
The notion of handling recursive data structures with pattern matching
|
||||
immediately gave rise to the idea of handling more general recursive
|
||||
'patterns' (i.e. recursion beyond recursive data structures)
|
||||
with pattern matching. Pattern matching would thus also be used to define
|
||||
recursive functions such as::
|
||||
|
||||
def fib(arg):
|
||||
match arg:
|
||||
case 0:
|
||||
return 1
|
||||
case 1:
|
||||
return 1
|
||||
case n:
|
||||
return fib(n-1) + fib(n-2)
|
||||
|
||||
As pattern matching was repeatedly integrated into new and emerging
|
||||
programming languages, its syntax slightly evolved and expanded. The two
|
||||
first cases in the ``fib`` example above could be written more succinctly
|
||||
as ``case 0 | 1:`` with ``|`` denoting alternative patterns. Moreover, the
|
||||
underscore ``_`` was widely adopted as a wildcard, a filler where neither
|
||||
the structure nor value of parts of a pattern were of substance. Since the
|
||||
underscore is already frequently used in equivalent capacity in Python's
|
||||
iterable unpacking (e.g., ``_, _, third, _* = something``) we kept these
|
||||
universal standards.
|
||||
|
||||
It is noteworthy that the concept of pattern matching has always been
|
||||
closely linked to the concept of functions. The different case clauses
|
||||
have always been considered as something like semi-indepedent functions
|
||||
where pattern variables take on the role of parameters. This becomes
|
||||
most apparent when pattern matching is written as an overloaded function,
|
||||
along the lines of (Standard ML)::
|
||||
|
||||
fun fib 0 = 1
|
||||
| fib 1 = 1
|
||||
| fib n = fib (n-1) + fib (n-2)
|
||||
|
||||
Even though such a strict separation of case clauses into independent
|
||||
functions does not make sense in Python, we find that patterns share many
|
||||
syntactic rules with parameters, such as binding arguments to unqualified
|
||||
names only or that variable/parameter names must not be repeated for
|
||||
a particular pattern/function.
|
||||
|
||||
With its emphasis on abstraction and encapsulation, object-oriented
|
||||
programming posed a serious challenge to pattern matching. In short: in
|
||||
object-oriented programming, we can no longer view objects as tagged tuples.
|
||||
The arguments passed into the constructor do not necessarily specify the
|
||||
attributes or fields of the objects. Moreover, there is no longer a strict
|
||||
ordering of an object's fields and some of the fields might be private and
|
||||
thus inaccessible. And on top of this, the given object might actually be
|
||||
an instance of a subclass with slightly different structure.
|
||||
|
||||
To address this challenge, patterns became increasingly independent of the
|
||||
original tuple constructors. In a pattern like ``Node(left, right)``,
|
||||
``Node`` is no longer a passive tag, but rather a function that can actively
|
||||
check for any given object whether it has the right structure and extract a
|
||||
``left`` and ``right`` field. In other words: the ``Node``-tag becomes a
|
||||
function that transforms an object into a tuple or returns some failure
|
||||
indicator if it is not possible.
|
||||
|
||||
In Python, we simply use ``isinstance()`` together with the ``__match_args__``
|
||||
field of a class to check whether an object has the correct structure and
|
||||
then transform some of its attributes into a tuple. For the `Node` example
|
||||
above, for instance, we would have ``__match_args__ = ('left', 'right')`` to
|
||||
indicate that these two attributes should be extracted to form the tuple.
|
||||
That is, ``case Node(x, y)`` would first check whether a given object is an
|
||||
instance of ``Node`` and then assign ``left`` to ``x`` and ``right`` to ``y``,
|
||||
respectively.
|
||||
|
||||
Paying tribute to Python's dynamic nature with 'duck typing', however, we
|
||||
also added a more direct way to specify the presence of, or constraints on
|
||||
specific attributes. Instead of ``Node(x, y)`` you could also write
|
||||
``object(left=x, right=y)``, effectively eliminating the ``isinstance()``
|
||||
check and thus supporting any object with ``left`` and ``right`` attributes.
|
||||
Or you would combine these ideas to write ``Node(right=y)`` so as to require
|
||||
an instance of ``Node`` but only extract the value of the `right` attribute.
|
||||
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the
|
||||
CC0-1.0-Universal license, whichever is more permissive.
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
|
@ -0,0 +1,385 @@
|
|||
PEP: 636
|
||||
Title: Structural Pattern Matching: Tutorial
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Daniel F Moisset <dfmoisset@gmail.com>,
|
||||
Tobias Kohn <kohnt@tobiaskohn.ch>
|
||||
Sponsor: Guido van Rossum <guido@python.org>
|
||||
BDFL-Delegate:
|
||||
Discussions-To: Python-Dev <python-dev@python.org>
|
||||
Status: Draft
|
||||
Type: Informational
|
||||
Content-Type: text/x-rst
|
||||
Created: 12-Sep-2020
|
||||
Python-Version: 3.10
|
||||
Post-History:
|
||||
Resolution:
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
**NOTE:** This draft is incomplete and not intended for review yet.
|
||||
We're checking it into the peps repo for the convenience of the authors.
|
||||
|
||||
This PEP is a tutorial for the pattern matching introduced by PEP 634.
|
||||
|
||||
PEP 622 proposed syntax for pattern matching, which received detailed discussion
|
||||
both from the community and the Steering Council. A frequent concern was
|
||||
about how easy it would be to explain (and learn) this feature. This PEP
|
||||
addresses that concern providing the kind of document which developers could use
|
||||
to learn about pattern matching in Python.
|
||||
|
||||
This is considered supporting material for PEP 634 (the technical specification
|
||||
for pattern matching) and PEP 635 (the motivation and rationale for having pattern
|
||||
matching and design considerations).
|
||||
|
||||
Meta
|
||||
====
|
||||
|
||||
This section is intended to get in sync about style and language with
|
||||
co-authors. It should be removed from the released PEP
|
||||
|
||||
The following are design decisions I made while writing this:
|
||||
|
||||
1. Who is the target audience?
|
||||
I'm considering "People with general Python experience" (i.e. who shouldn't be surprised
|
||||
at anything in the Python tutorial), but not necessarily involved with the
|
||||
design/development or Python. I'm assuming someone who hasn't been exposed to pattern
|
||||
matching in other languages.
|
||||
|
||||
2. How detailed should this document be?
|
||||
I considered a range from "very superficial" (like the detail level you might find about
|
||||
statements in the Python tutorial) to "terse but complete" like
|
||||
https://github.com/gvanrossum/patma/#tutorial
|
||||
to "long and detailed". I chose the later, we can always trim down from that.
|
||||
|
||||
3. What kind of examples to use?
|
||||
I tried to write examples that are could that I might write using pattern matching. I
|
||||
avoided going
|
||||
for a full application (because the examples I have in mind are too large for a PEP) but
|
||||
I tried to follow ideas related to a single project to thread the story-telling more
|
||||
easily. This is probably the most controversial thing here, and if the rest of
|
||||
the authors dislike it, we can change to a more formal explanatory style.
|
||||
|
||||
Other rules I'm following (let me know if I forgot to):
|
||||
|
||||
* I'm not going to reference/compare with other languages
|
||||
* I'm not trying to convince the reader that this is a good idea (that's the job of
|
||||
PEP 635) just explain how to use it
|
||||
* I'm not trying to cover every corner case (that's the job of PEP 634), just cover
|
||||
how to use the full functionality in the "normal" cases.
|
||||
* I talk to the learner in second person
|
||||
|
||||
Tutorial
|
||||
========
|
||||
|
||||
As an example to motivate this tutorial, you will be writing a text-adventure. That is
|
||||
a form of interactive fiction where the user enters text commands to interact with a
|
||||
fictional world and receives text descriptions of what happens. Commands will be
|
||||
simplified forms of natural language like ``get sword``, ``attack dragon``, ``go north``,
|
||||
``enter shop`` or ``buy cheese``.
|
||||
|
||||
Matching sequences
|
||||
------------------
|
||||
|
||||
Your main loop will need to get input from the user and split it into words, let's say
|
||||
a list of strings like this::
|
||||
|
||||
command = input("What are you doing next? ")
|
||||
# analyze the result of command.split()
|
||||
|
||||
The next step is to interpret the words. Most of our commands will have two words: an
|
||||
action and an object. So you may be tempted to do the following::
|
||||
|
||||
[action, obj] = command.split()
|
||||
... # interpret action, obj
|
||||
|
||||
The problem with that line of code is that it's missing something: what if the user
|
||||
types more or fewer than 2 words? To prevent this problem you can either check the length
|
||||
of the list of words, or capture the ``ValueError`` that the statement above would raise.
|
||||
|
||||
You can use a matching statement instead::
|
||||
|
||||
match command.split():
|
||||
case [action, obj]:
|
||||
... # interpret action, obj
|
||||
|
||||
The ``match`` statement evaluates the **subject** after the ``match`` keyword, and checks
|
||||
it against the **pattern** next to ``case``. A pattern is able to do two different
|
||||
things:
|
||||
|
||||
* Verify that the subject has certain structure. In your case, the ``[action, obj]``
|
||||
pattern matches any sequence of exactly two elements. This is called **matching**
|
||||
* It will bind some names in the pattern to component elements of your subject. In
|
||||
this case, if the list has two elements, it will bind ``action = subject[0]`` and
|
||||
``obj = subject[1]``. This is called **destructuring**
|
||||
|
||||
If there's a match, the statements inside the ``case`` clause will be executed with the
|
||||
bound variables. If there's no match, nothing happens and the next statement after
|
||||
``match`` keeps running.
|
||||
|
||||
TODO: discuss other sequences, tuples. Discuss syntax with parenthesis. discuss
|
||||
iterators? discuss [x, x] possibly later on?
|
||||
|
||||
Matching multiple patterns
|
||||
--------------------------
|
||||
|
||||
Even if most commands have the action/object form, you might want to have user commands
|
||||
of different lengths. For example you might want to add single verbs with no object like
|
||||
``look`` or ``quit``. A match statement can (and is likely to) have more than one
|
||||
``case``::
|
||||
|
||||
match command.split():
|
||||
case [action]:
|
||||
... # interpret single-verb action
|
||||
case [action, obj]:
|
||||
... # interpret action, obj
|
||||
|
||||
The ``match`` statement will check patterns from top to bottom. If the pattern doesn't
|
||||
match the subject, the next pattern will be tried. However, once the *first*
|
||||
matching ``case`` clause is found, the body of that clause is executed, and all further
|
||||
``case`` clauses are ignored. This is similar to the way that an ``if/elif/elif/...``
|
||||
statement works.
|
||||
|
||||
Matching specific values
|
||||
------------------------
|
||||
|
||||
Your code still needs to look at the specific actions and conditionally run
|
||||
different logic depending on the specific action (e.g., ``quit``, ``attack``, or ``buy``).
|
||||
You could do that using a chain of ``if/elif/elif/...``, or using a dictionary of
|
||||
functions, but here we'll leverage pattern matching to solve that task. Instead of a
|
||||
variable, you can use literal values in patterns (like ``"quit"``, ``42``, or ``None``).
|
||||
This allows you to write::
|
||||
|
||||
match command.split():
|
||||
case ["quit"]:
|
||||
print("Goodbye!")
|
||||
quit_game()
|
||||
case ["look"]:
|
||||
current_room.describe()
|
||||
case ["get", obj]:
|
||||
character.get(obj, current_room)
|
||||
case ["go", direction]:
|
||||
current_room = current_room.neighbor(direction)
|
||||
# The rest of your commands go here
|
||||
|
||||
A pattern like ``["get", obj]`` will match only 2-element sequences that have a first
|
||||
element equal to ``"get"``. When destructuring, it will bind ``obj = subject[1]``.
|
||||
|
||||
As you can see in the ``go`` case, we also can use different variable names in
|
||||
different patterns.
|
||||
|
||||
FIXME: This *might* be the place to explain a bit that when I say "literal" I mean it
|
||||
literally, and a "soft constant" will not work :)
|
||||
|
||||
Matching slices
|
||||
---------------
|
||||
|
||||
A player may be able to drop multiple objects by using a series of commands
|
||||
``drop key``, ``drop sword``, ``drop cheese``. This interface might be cumbersome, and
|
||||
you might like to allow dropping multiple items in a single command, like
|
||||
``drop key sword cheese``. In this case you don't know beforehand how many words will
|
||||
be in the command, but you can use extended unpacking in patterns in the same way that
|
||||
they are allowed in assignments::
|
||||
|
||||
match command.split():
|
||||
case ["drop", *objects]:
|
||||
for obj in objects:
|
||||
character.drop(obj, current_room)
|
||||
# The rest of your commands go here
|
||||
|
||||
This will match any sequences having "drop" as its first elements. All remaining
|
||||
elements will be captured in a ``list`` object which will be bound to the ``objects``
|
||||
variable.
|
||||
|
||||
This syntax has similar restrictions as sequence unpacking: you can not have more than one
|
||||
starred name in a pattern.
|
||||
|
||||
Adding a catch-all
|
||||
------------------
|
||||
|
||||
You may want to print an error message saying that the command wasn't recognized when
|
||||
all the patterns fail. You could use the feature we just learned and write the
|
||||
following::
|
||||
|
||||
match command.split():
|
||||
case ["quit"]: ... # Code omitted for brevity
|
||||
case ["go", direction]: ...
|
||||
case ["drop", *objects]: ...
|
||||
... # Other case clauses
|
||||
case [*ignored_words]:
|
||||
print(f"Sorry, I couldn't understand {command!r}")
|
||||
|
||||
Note that you must add this last pattern at the end, otherwise it will match before other
|
||||
possible patterns that could be considered. This works but it's a bit verbose and
|
||||
somewhat wasteful: this will make a full copy of the word list, which will be bound to
|
||||
``ignored_words`` even if it's never used.
|
||||
|
||||
You can use an special pattern which is written ``_``, which always matches but it
|
||||
doesn't bind anything. which would allow you to rewrite::
|
||||
|
||||
match command.split():
|
||||
... # Other case clauses
|
||||
case [*_]:
|
||||
print(f"Sorry, I couldn't understand {command!r}")
|
||||
|
||||
This pattern will match for any sequence. In this case we can simplify even more and
|
||||
match any object::
|
||||
|
||||
match command.split():
|
||||
... # Other case clauses
|
||||
case _:
|
||||
print(f"Sorry, I couldn't understand {command!r}")
|
||||
|
||||
TODO: Explain about syntaxerror when having an irrefutable pattern above others?
|
||||
|
||||
How patterns are composed
|
||||
-------------------------
|
||||
|
||||
This is a good moment to step back from the examples and understand how the patterns
|
||||
that you have been using are built. Patterns can be nested within each other, and we
|
||||
have being doing that implicitly in the examples above.
|
||||
|
||||
There are some "simple" patterns ("simple" here meaning that they do not contain other
|
||||
patterns) that we've seen:
|
||||
|
||||
* **Literal patterns** (string literals, number literals, ``True``, ``False``, and
|
||||
``None``)
|
||||
* The **wildcard pattern** ``_``
|
||||
* **Capture patterns** (stand-alone names like ``direction``, ``action``, ``objects``). We
|
||||
never discussed these separately, but used them as part of other patterns. Note that
|
||||
a capture pattern by itself will always match, and usually makes sense only
|
||||
as a catch-all at the end of your ``match`` if you desire to bind the name to the
|
||||
subject.
|
||||
|
||||
Until now, the only non-simple pattern we have experimented with is the sequence pattern.
|
||||
Each element in a sequence pattern can in fact be
|
||||
any other pattern. This means that you could write a pattern like
|
||||
``["first", (left, right), *rest]``. This will match subjects which are a sequence of at
|
||||
least two elements, where the first one is equal to ``"first"`` and the second one is
|
||||
in turn a sequence of two elements. It will also bind ``left=subject[1][0]``,
|
||||
``right=subject[1][1]``, and ``rest = subject[2:]``
|
||||
|
||||
Alternate patterns
|
||||
------------------
|
||||
|
||||
Going back to the adventure game example, you may find that you'd like to have several
|
||||
patterns resulting in the same outcome. For example, you might want the commands
|
||||
``north`` and ``go north`` be equivalent. You may also desire to have aliases for
|
||||
``get X``, ``pick up X`` and ``pick X up`` for any X.
|
||||
|
||||
The ``|`` symbol in patterns combines them as alternatives. You could for example write::
|
||||
|
||||
match command.split():
|
||||
... # Other case clauses
|
||||
case ["north"] | ["go", "north"]:
|
||||
current_room = current_room.neighbor("north")
|
||||
case ["get", obj] | ["pick", "up", obj] | ["pick", obj, "up"]:
|
||||
... # Code for picking up the given object
|
||||
|
||||
This is called an **or pattern** and will produce the expected result. Patterns are
|
||||
attempted from left to right; this may be relevant to know what is bound if more than
|
||||
one alternative matches. An important restriction when writing or patterns is that all
|
||||
alternatives should bind the same variables. So a pattern ``[1, x] | [2, y]`` is not
|
||||
allowed because it would make unclear which variable would be bound after a successful
|
||||
match. ``[1, x] | [2, x]`` is perfectly fine and will always bind ``x`` if successful.
|
||||
|
||||
|
||||
Capturing matched sub-patterns
|
||||
------------------------------
|
||||
|
||||
The first version of our "go" command was written with a ``["go", direction]`` pattern.
|
||||
The change we did in our last version using the pattern ``["north"] | ["go", "north"]``
|
||||
has some benefits but also some drawbacks in comparison: the latest version allows the
|
||||
alias, but also has the direction hardcoded, which will force us to actually have
|
||||
separate patterns for north/south/east/west. This leads to some code duplication, but at
|
||||
the same time we get better input validation, and we will not be getting into that
|
||||
branch if the command entered by the user is ``"go figure!"`` instead of an direction.
|
||||
|
||||
We could try to get the best of both worlds doing the following (I'll omit the aliased
|
||||
version without "go" for brevity)::
|
||||
|
||||
match command.split():
|
||||
case ["go", ("north" | "south" | "east" | "west")]:
|
||||
current_room = current_room.neighbor(...)
|
||||
# how do I know which direction to go?
|
||||
|
||||
This code is a single branch, and it verifies that the word after "go" is really a
|
||||
direction. But the code moving the player around needs to know which one was chosen and
|
||||
has no way to do so. What we need is a pattern that behaves like the or pattern but at
|
||||
the same time does a capture. We can do so with a **walrus pattern**::
|
||||
|
||||
match command.split():
|
||||
case ["go", direction := ("north" | "south" | "east" | "west")]:
|
||||
current_room = current_room.neighbor(direction)
|
||||
|
||||
The walrus pattern (named like that because the ``:=`` operator looks like a sideways
|
||||
walrus) matches whatever pattern is on its right hand side, but also binds the value to
|
||||
a name.
|
||||
|
||||
Adding conditions to patterns
|
||||
-----------------------------
|
||||
|
||||
The patterns we have explored above can do some powerful data filtering, but sometimes
|
||||
you may wish for the full power of a boolean expression. Let's say that you would actually
|
||||
like to allow a "go" command only in a restricted set of directions based on the possible
|
||||
exits from the current_room. We can achieve that by adding a **guard** to our
|
||||
case-clause. Guards consist of the ``if`` keyword followed by any expression::
|
||||
|
||||
match command.split():
|
||||
case ["go", direction] if direction in current_room.exits:
|
||||
current_room = current_room.neighbor(direction)
|
||||
case ["go", _]:
|
||||
print("Sorry, you can't go that way")
|
||||
|
||||
The guard is not part of the pattern, it's part of the case clause. It's only checked if
|
||||
the pattern matches, and after all the pattern variables have been bound (that's why the
|
||||
condition can use the ``direction`` variable in the example above). If the pattern
|
||||
matches and the condition is truthy, the body of the case clause runs normally. If the
|
||||
pattern matches but the condition is falsy, the match statement proceeds to check the
|
||||
next ``case`` clause as if the pattern hadn't matched (with the possible side-effect of
|
||||
having already bound some variables).
|
||||
|
||||
The sequence of these steps must be considered carefully when combining or-patterns and
|
||||
guards. If you have ``case [x, 100] | [0, x] if x > 10`` and your subject is
|
||||
``[0, 100]``, the clause will be skipped. This happens because:
|
||||
|
||||
* The or-pattern finds the first alternative that matches the subject, which happens to
|
||||
be ``[x, 100]``
|
||||
* ``x`` is bound to 0
|
||||
* The condition x > 10 is checked. Given that it's false, the whole case clause is
|
||||
skipped. The ``[0, x]`` pattern is never attempted.
|
||||
|
||||
Going to the cloud: Mappings
|
||||
----------------------------
|
||||
|
||||
TODO: Give the motivating example of netowrk requests, describe JSON based "protocol"
|
||||
|
||||
TODO: partial matches, double stars
|
||||
|
||||
Matching objects
|
||||
----------------
|
||||
|
||||
UI events motivations. describe events in dataclasses. inspiration for event objects
|
||||
can be taken from https://www.pygame.org/docs/ref/event.html
|
||||
|
||||
example of getting constants from module (like key names for keyboard events)
|
||||
|
||||
customizing match_args?
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the
|
||||
CC0-1.0-Universal license, whichever is more permissive.
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
Loading…
Reference in New Issue