python-peps/pep-0634.rst

654 lines
21 KiB
ReStructuredText

PEP: 634
Title: Structural Pattern Matching: Specification
Version: $Revision$
Last-Modified: $Date$
Author: Brandt Bucher <brandtbucher@gmail.com>,
Guido van Rossum <guido@python.org>
BDFL-Delegate:
Discussions-To: Python-Dev <python-dev@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 12-Sep-2020
Python-Version: 3.10
Post-History:
Replaces: 622
Resolution:
Abstract
========
**NOTE:** This draft is incomplete and not intended for review yet.
We're checking it into the peps repo for the convenience of the authors.
This PEP provides the technical specification for the ``match``
statement. It replaces PEP 622, which is hereby split in three parts:
- PEP 634: Specification
- PEP 635: Motivation and Rationale
- PEP 636: Tutorial
This PEP is intentionally devoid of commentary; all explanations of
design choices are in PEP 635. First-time readers are encouraged to
start with PEP 636, which provides a gentler introduction to the
concepts, syntax and semantics of patterns.
TODO: Maybe we should add simple examples back to each section?
There's no rule saying a spec can't include examples, and currently
it's *very* dry.
TODO: Go over the feedback from the SC and make sure everything's
somehow incorporated (either here or in PEP 635, which has to answer
why we didn't budge on most of the SC's initial requests).
Syntax and Semantics
====================
See `Appendix A`_ for the complete grammar.
Overview and terminology
------------------------
The pattern matching process takes as input a pattern (following
``case``) and a subject value (following ``match``). Phrases to
describe the process include "the pattern is matched with (or against)
the subject value" and "we match the pattern against (or with) the
subject value".
The primary outcome of pattern matching is success or failure. In
case of success we may say "the pattern succeeds", "the match
succeeds", or "the pattern matches the subject value".
In many cases a pattern contains subpatterns, and success or failure
is determined by the success or failure of matching those subpatterns
against the value (e.g., for OR patterns) or against parts of the
value (e.g., for sequence patterns). This process typically processes
the subpatterns from left to right until the overall outcome is
determined. E.g., an OR pattern succeeds at the first succeeding
subpattern, while a sequence patterns fails at the first failing
subpattern.
A secondary outcome of pattern matching may be one or more name
bindings. We may say "the pattern binds a value to a name". When
subpatterns tried until the first success, only the bindings due to
the successful subpattern are valid; when trying until the first
failure, the bindings are merged. Several more rules, explained
below, apply to these cases.
The ``match`` statement
-----------------------
Syntax::
match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT
match_expr:
| star_named_expression ',' star_named_expressions?
| named_expression
case_block: "case" patterns [guard] ':' block
guard: 'if' named_expression
The rules ``star_named_expression``, ``star_named_expressions``,
``named_expression`` and ``block`` are part of the `standard Python
grammar <https://docs.python.org/3.10/reference/grammar.html>`_.
The rule ``patterns`` is specified below.
For context, ``match_stmt`` is a new alternative for
``compound_statement``::
compound_statement:
| if_stmt
...
| match_stmt
The ``match`` and ``case`` keywords are soft keywords, i.e. they are
not reserved words in other grammatical contexts (including at the
start of a line if there is no colon where expected). This implies
that they are recognized as keywords when part of a ``match``
statement or ``case`` block only, and are allowed to be used in all
other context as variable or argument names.
Match semantics
^^^^^^^^^^^^^^^
TODO: Make the language about choosing a block more precise.
The overall semantics for choosing the match is to choose the first
matching pattern (including guard) and execute the corresponding
block. The remaining patterns are not tried. If there are no
matching patterns, execution continues at the following statement.
Name bindings made during a successful pattern match outlive the
executed block and can be used after the ``match`` statement.
During failed pattern matches, some subpatterns may succeed. For
example, while matching the pattern ``(0, x, 1)`` with the value ``[0,
1, 2]``, the subpattern ``x`` may succeed if the list elements are
matched from left to right. The implementation may choose to either
make persistent bindings for those partial matches or not. User code
including a ``match`` statement should not rely on the bindings being
made for a failed match, but also shouldn't assume that variables are
unchanged by a failed match. This part of the behavior is left
intentionally unspecified so different implementations can add
optimizations, and to prevent introducing semantic restrictions that
could limit the extensibility of this feature.
The precise pattern binding rules vary per pattern type and are
specified below.
.. _guards:
Guards
^^^^^^
Syntax::
case_block: "case" patterns [guard] ':' block
guard: 'if' named_expression
If a guard is present on a case block, once all patterns succeed,
the expression in the guard is evaluated.
If this raises an exception, the exception bubbles up.
Otherwise, if the condition is "truthy" the block is selected;
if it is "falsy" the next case block (if any) is tried.
.. _patterns:
Patterns
--------
The top-level syntax for patterns is as follows::
patterns: open_sequence_pattern | pattern
pattern: walrus_pattern | or_pattern
walrus_pattern: capture_pattern ':=' or_pattern
or_pattern: '|'.closed_pattern+
closed_pattern:
| literal_pattern
| capture_pattern
| wildcard_pattern
| constant_pattern
| group_pattern
| sequence_pattern
| mapping_pattern
| class_pattern
Walrus patterns
^^^^^^^^^^^^^^^
TODO: Change to ``or_pattern 'as' capture_pattern`` (and rename)?
Syntax::
walrus_pattern: capture_pattern ':=' or_pattern
(Note: the name on the left may not be ``_``.)
A walrus pattern matches the OR pattern on the right of the ``:=``
operator against the subject. If this fails, the walrus pattern fails.
Otherwise, the walrus pattern binds the subject to the name on the left
of the ``:=`` operator and succeeds.
OR patterns
^^^^^^^^^^^
Syntax::
or_pattern: '|'.closed_pattern+
When two or more patterns are separated by vertical bars (``|``),
this is called an OR pattern. (A single closed pattern is just that.)
Each subpattern must bind the same set of names.
An OR pattern matches each of its subpatterns in turn to the subject,
until one succeeds. The OR pattern is then deemed to succeed.
If none of the subpatterns succeed the OR pattern fails.
.. _literal_pattern:
Literal Patterns
^^^^^^^^^^^^^^^^
Syntax::
literal_pattern:
| signed_number
| signed_number '+' NUMBER
| signed_number '-' NUMBER
| strings
| 'None'
| 'True'
| 'False'
signed_number: NUMBER | '-' NUMBER
The rule ``strings`` and the token ``NUMBER`` are defined in the
standard Python grammar.
Triple-quoted strings are supported. Raw strings and byte strings
are supported. F-strings are not supported.
The forms ``signed_number '+' NUMBER`` and ``signed_number '-'
NUMBER`` are only permitted to express complex numbers; they require a
real number on the left and an imaginary number on the right.
A literal pattern succeeds if the subject value compares equal to the
value expressed by the literal, using the following comparisons rules:
- Numbers and strings are compared using the ``==`` operator.
- The singleton literals ``None``, ``True`` and ``False`` are compared
using the ``is`` operator.
.. _capture_pattern:
Capture Patterns
^^^^^^^^^^^^^^^^
Syntax::
capture_pattern: !"_" NAME
The single underscore (``_``) is not a capture pattern (this is what
``!"_"`` expresses). It is treated as a `wildcard pattern`_.
A capture pattern always succeeds. It binds the subject value to the
name using the scoping rules for name binding established for the
walrus operator in PEP 572. (Summary: the name becomes a local
variable in the closest containing function scope unless there's an
applicable ``nonlocal`` or ``global`` statement.)
In a given pattern, a given name may be bound only once. This
disallows for example ``case x, x: ...`` but allows ``case [x] | x:
...``.
.. _wildcard_pattern:
Wildcard Pattern
^^^^^^^^^^^^^^^^
Syntax::
wildcard_pattern: "_"
A wildcard pattern always succeeds. It binds no name.
.. _constant_value_pattern:
Constant Value Patterns
^^^^^^^^^^^^^^^^^^^^^^^
TODO: Rename to Value Patterns? (But ``value[s]_pattern`` is already
a grammatical rule.)
Syntax::
constant_pattern: attr
attr: name_or_attr '.' NAME
name_or_attr: attr | NAME
The dotted name in the pattern is looked up using the standard Python
name resolution rules. However, when the same constant pattern occurs
multiple times in the same ``match`` statement, the interpreter may cache
the first value found and reuse it, rather than repeat the same
lookup. (To clarify, this cache is strictly tied to a given execution
of a given ``match`` statement.)
The pattern succeeds if the value found thus compares equal to the
subject value (using the ``==`` operator).
Group Patterns
^^^^^^^^^^^^^^
Syntax:
group_pattern: '(' pattern ')'
(For the syntax of ``pattern``, see Patterns above. Note that it
contains no comma -- a parenthesized series of items with at least one
comma is a sequence pattern, as is ``()``.)
A parenthesized pattern has no additional syntax. It allows users to
add parentheses around patterns to emphasize the intended grouping.
.. _sequence_pattern:
Sequence Patterns
^^^^^^^^^^^^^^^^^
Syntax::
sequence_pattern:
| '[' [values_pattern] ']'
| '(' [open_sequence_pattern] ')'
open_sequence_pattern: value_pattern ',' [values_pattern]
values_pattern: ','.value_pattern+ ','?
value_pattern: star_pattern | pattern
star_pattern: '*' (capture_pattern | wildcard_pattern)
(Note that a single parenthesized pattern without a trailing comma is
a group pattern, not a sequence pattern. However a single pattern
enclosed in ``[...]`` is still a sequence pattern.)
There is no semantic difference between a sequence pattern using
``[...]``, a sequence pattern using ``(...)``, and an open sequence
pattern.
A sequence pattern may contain at most one star subpattern. The star
subpattern may occur in any position. If no star subpattern is
present, the sequence pattern is a fixed-length sequence pattern;
otherwise it is a variable-length sequence pattern.
A sequence pattern fails if the subject value is not an instance of
``collections.abc.Sequence``. It also fails if the subject value is
an instance of ``str``, ``bytes`` or ``bytearray``.
A fixed-length sequence pattern fails if the length of the subject
sequence is not equal to the number of subpatterns.
A variable-length sequence pattern fails if the length of the subject
sequence is less than the number of non-star subpatterns.
The length of the subject sequence is obtained using the builtin
``len()`` function (i.e., via the ``__len__`` protocol). However, the
interpreter may cache this value in a similar manner as described for
constant value patterns.
A fixed-length sequence pattern matches the subpatterns to
corresponding items of the subject sequence, from left to right.
Matching stops (with a failure) as soon as a subpattern fails. If all
subpatterns succeed in matching their corresponding item, the sequence
pattern succeeds.
A variable-length sequence pattern first matches the leading non-star
subpatterns to the curresponding items of the subject sequence, as for
a fixed-length sequence. If this succeeds, the star subpattern
matches a list formed of the remaining subject items, with items
removed from the end corresponding to the non-star subpatterns
following the star subpattern. The remaining non-star subpatterns are
then matched to the corresponding subject items, as for a fixed-length
sequence.
.. _mapping_pattern:
Mapping Patterns
^^^^^^^^^^^^^^^^
Syntax::
mapping_pattern: '{' [items_pattern] '}'
items_pattern: ','.key_value_pattern+ ','?
key_value_pattern:
| (literal_pattern | constant_pattern) ':' or_pattern
| double_star_pattern
double_star_pattern: '**' capture_pattern
(Note that ``**_`` is disallowed by this syntax.)
A mapping pattern may contain at most one double star pattern,
and it must be last.
A mapping pattern may not contain duplicate key values.
(If all key patterns are literal patterns this is considered a
syntax error; otherwise this is a runtime error and will
raise ``TypeError``.)
A mapping pattern fails if the subject value is not an instance of
``collections.abc.Mapping``.
A mapping pattern succeeds if every key given in the mapping pattern
matches the corresponding item of the subject mapping. If a ``'**'
NAME`` form is present, that name is bound to a ``dict`` containing
remaining key-value pairs from the subject mapping.
If duplicate keys are detected in the mapping pattern, the pattern is
considered invalid, and a ``ValueError`` is raised.
Key-value pairs are matched using the two-argument form of the
subject's ``get()`` method. As a consequence, matched key-value pairs
must already be present in the mapping, and not created on-the-fly by
``__missing__`` or ``__getitem__``. For example,
``collections.defaultdict`` instances will only be matched by patterns
with keys that were already present when the ``match`` block was
entered.
.. _class_pattern:
Class Patterns
^^^^^^^^^^^^^^
Syntax::
class_pattern:
| name_or_attr '(' [pattern_arguments ','?] ')'
pattern_arguments:
| positional_patterns [',' keyword_patterns]
| keyword_patterns
positional_patterns: ','.pattern+
keyword_patterns: ','.keyword_pattern+
keyword_pattern: NAME '=' or_pattern
(Note that positional patterns may be unparenthesized walrus patterns,
but keyword patterns may not.)
A class pattern may not repeat the same keyword multiple times.
If ``name_or_attr`` is not an instance of the builtin ``type``,
``TypeError`` is raised.
A class pattern fails if the subject is not an instance of ``name_or_attr``.
This is tested using ``isinstance()``.
If no arguments are present, the pattern succeeds if the ``isinstance()``
check succeeds. Otherwise:
- If only keyword patterns are present, they are processed as follows,
one by one:
- The keyword is looked up as an attribute on the subject.
- If this raises an exception other than ``AttributeError``,
the exception bubbles up.
- If this raises ``AttributeError`` the class pattern fails.
- Otherwise, the subpattern associated with the keyword is matched
against the attribute value. If this fails, the class pattern fails.
If it succeeds, the match proceeds to the next keyword.
- If all keyword patterns succeed, the class pattern as a whole succeeds.
- If any positional patterns are present, they are converted to keyword
patterns (see below) and treated as additional keyword patterns,
preceding the syntactic keyword patterns (if any).
Positional patterns are converted to keyword patterns using the
``__match_args__`` attribute on the class designated by ``name_or_attr``,
as follows:
- For a number of built-in types (specified below),
a single positional subpattern is accepted which will match
the entire subject; for these types no keyword patterns are accepted.
- The equivalent of ``getattr(cls, "__match_args__", ()))`` is called.
- If this raises an exception the exception bubbles up.
- If the returned value is not a list or tuple, the conversion fails
and ``TypeError`` is raised.
- If there are more positional patterns than the length of
``__match_args__``` (as obtained using ``len()``), ``TypeError`` is raised.
- Otherwise, positional pattern ``i`` is converted to a keyword pattern
using ``__match_args__[i]`` as the keyword,
provided it the latter is a string;
if it is not, ``TypeError`` is raised.
- For duplicate keywords, ``TypeError`` is raised.
Once the positional patterns have been converted to keyword patterns,
the match proceeds as if there were only keyword patterns.
As mentioned above, for the following built-in types the handling of
positional subpatterns is different:
``bool``, ``bytearray``, ``bytes``, ``dict``, ``float``,
``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``.
This behavior is roughly equivalent to the following::
class C:
__match_args__ = ["__match_self_prop__"]
@property
def __match_self_prop__(self):
return self
Side effects
============
The only side-effect produced explicitly by the matching process is
the binding of names. However, the process relies on attribute
access, instance checks, ``len()``, equality and item access on the
subject and some of its components. It also evaluates constant value
patterns and the class name of class patterns. While none of those
typically create any side-effects, in theory they could. This
proposal intentionally leaves out any specification of what methods
are called or how many times. This behavior is therefore undefined
and user code should not rely on it.
The standard library
====================
To facilitate the use of pattern matching, several changes will be
made to the standard library:
- Namedtuples and dataclasses will have auto-generated
``__match_args__``.
- For dataclasses the order of attributes in the generated
``__match_args__`` will be the same as the order of corresponding
arguments in the generated ``__init__()`` method. This includes the
situations where attributes are inherited from a superclass. Fields
with ``init=False`` are excluded from ``__match_args__``.
In addition, a systematic effort will be put into going through
existing standard library classes and adding ``__match_args__`` where
it looks beneficial.
.. _Appendix A:
Appendix A -- Full Grammar
==========================
TODO: Go over the differences with the reference implementation and
resolve them (either by fixing the PEP or by fixing the reference
implementation).
Here is the full grammar for ``match_stmt``. This is an additional
alternative for ``compound_stmt``. Remember that ``match`` and
``case`` are soft keywords, i.e. they are not reserved words in other
grammatical contexts (including at the start of a line if there is no
colon where expected). By convention, hard keywords use single quotes
while soft keywords use double quotes.
Other notation used beyond standard EBNF:
- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*``
- ``!RULE`` is a negative lookahead assertion
::
match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT
match_expr:
| star_named_expression ',' [star_named_expressions]
| named_expression
case_block: "case" patterns [guard] ':' block
guard: 'if' named_expression
patterns: open_sequence_pattern | pattern
pattern: walrus_pattern | or_pattern
walrus_pattern: capture_pattern ':=' or_pattern
or_pattern: '|'.closed_pattern+
closed_pattern:
| literal_pattern
| capture_pattern
| wildcard_pattern
| constant_pattern
| group_pattern
| sequence_pattern
| mapping_pattern
| class_pattern
literal_pattern:
| signed_number !('+' | '-')
| signed_number '+' NUMBER
| signed_number '-' NUMBER
| strings
| 'None'
| 'True'
| 'False'
signed_number: NUMBER | '-' NUMBER
capture_pattern: !"_" NAME !('.' | '(' | '=')
wildcard_pattern: "_"
constant_pattern: attr !('.' | '(' | '=')
attr: name_or_attr '.' NAME
name_or_attr: attr | NAME
group_pattern: '(' pattern ')'
sequence_pattern:
| '[' [values_pattern] ']'
| '(' [open_sequence_pattern] ')'
open_sequence_pattern: value_pattern ',' [values_pattern]
values_pattern: ','.value_pattern+ ','?
value_pattern: star_pattern | pattern
star_pattern: '*' (capture_pattern | wildcard_pattern)
mapping_pattern: '{' [items_pattern] '}'
items_pattern: ','.key_value_pattern+ ','?
key_value_pattern:
| (literal_pattern | constant_pattern) ':' or_pattern
| double_star_pattern
double_star_pattern: '**' capture_pattern
class_pattern:
| name_or_attr '(' [pattern_arguments ','?] ')'
pattern_arguments:
| positional_patterns [',' keyword_patterns]
| keyword_patterns
positional_patterns: ','.pattern+
keyword_patterns: ','.keyword_pattern+
keyword_pattern: NAME '=' or_pattern
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: