PEP 622: Include overview to different patterns and summarised syntax (#1501)

Also some other improvements, and added Daniel as author (that was Guido's work).

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
This commit is contained in:
Daniel F Moisset 2020-07-08 00:00:55 +01:00 committed by GitHub
parent e43a11d86e
commit 26ac4b3d3e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 146 additions and 17 deletions

View File

@ -3,6 +3,7 @@ Title: Structural Pattern Matching
Version: $Revision$
Last-Modified: $Date$
Author: Brandt Bucher <brandtbucher@gmail.com>,
Daniel F Moisset <dfmoisset@gmail.com>,
Tobias Kohn <kohnt@tobiaskohn.ch>,
Ivan Levkivskyi <levkivskyi@gmail.com>,
Guido van Rossum <guido@python.org>,
@ -14,7 +15,7 @@ Type: Standards Track
Content-Type: text/x-rst
Created: 23-Jun-2020
Python-Version: 3.10
Post-History: 23-Jun-2020
Post-History: 23-Jun-2020, 8-Jul-2020
Resolution:
@ -280,8 +281,67 @@ section for more details.
Syntax and Semantics
====================
Case clauses
------------
Patterns
--------
The **pattern** is a new syntactical construct, that could be considered a loose
generalization of assignment targets. The key properties of a pattern are what
types and shapes of subjects it accepts, what variables it captures and how
it extracts them from the subject. For example the pattern ``[a, b]`` matches
only sequences of exactly 2 elements, extracting the first element into ``a``
and the second one into ``b``.
This PEP defines several types of patterns. These are certainly not the
only possible ones, so the design decision was made to choose a subset of
functionality that is useful now but conservative. More patterns can be added
later as this feature gets more widespread use. See the `rejected ideas`_
and `deferred ideas`_ sections for more details.
The patterns listed here are described in more detail below, but summarized
together in this section for simplicity:
- A **literal pattern** is useful to filter constant values in a structure.
It looks like a Python literal (including some values like ``True``,
``False`` and ``None``). It only matches objects equal to the literal, and
never binds.
- A **capture pattern** looks like ``x`` and is equivalent to an identical
assignment target: it always matches and binds the variable
with the given name.
- The **wildcard pattern** is a single underscore: ``_``. It always matches,
but does not capture any variable (which prevents interference with other
uses for ``_`` and allows for some optimizations).
- A **constant value pattern** works like the literal but for certain named
constants. Note that it must be a qualified (dotted) name, given the possible
ambiguity with a capture pattern. It looks like ``Color.RED`` and
only matches values equal to the corresponding value. It never binds.
- A **sequence pattern** looks like ``[a, *rest, b]`` and is similar to
a list unpacking. An important difference is that the elements nested
within it can be any kind of patterns, not just names or sequences.
It matches only sequences of appropriate length, as long as all the sub-patterns
also match. It makes all the bindings of its sub-patterns.
- A **mapping pattern** looks like ``{"user": u, "emails": [*es]}``. It matches
mappings with at least the set of provided keys, and if all the
sub-patterns match their corresponding values. It binds whatever the
sub-patterns bind while matching with the values corresponding to the keys.
Adding ``**rest`` at the end of the pattern to capture extra items is allowed.
- A **class pattern** is similar to the above but matches attributes instead
of keys. It looks like ``datetime.date(year=y, day=d)``. It matches
instances of the given type, having at least the specified
attributes, as long as the attributes match with the corresponding
sub-patterns. It binds whatever the sub-patterns bind when matching with the
values of
the given attributes. An optional protocol also allows matching positional
arguments.
- An **OR pattern** looks like ``[*x] | {"elems": [*x]}``. It matches if any
of its sub-patterns match. It uses the binding for the leftmost pattern
that matched.
- A **walrus pattern** looks like ``d := datetime(year=2020, month=m)``. It
matches only
if its sub-pattern also matches. It binds whatever the sub-pattern match does, and
also binds the named variable to the entire object.
The ``match`` statement
-----------------------
A simplified, approximate grammar for the proposed syntax is::
@ -299,14 +359,17 @@ A simplified, approximate grammar for the proposed syntax is::
closed_pattern:
| literal_pattern
| capture_pattern
| wildcard_pattern
| constant_pattern
| sequence_pattern
| mapping_pattern
| class_pattern
(See `Appendix A`_ for the full, unabridged grammar.)
See `Appendix A`_ for the full, unabridged grammar. The simplified grammars in
this section are there for helping the reader, not as a full specification.
We propose the match syntax to be a statement, not an expression. Although in
We propose that the match operation should be a statement, not an expression.
Although in
many languages it is an expression, being a statement better suits the general
logic of Python syntax. See `rejected ideas`_ for more discussion. The list of
allowed patterns is specified below in the `patterns`_ subsection.
@ -384,6 +447,16 @@ building blocks. The following patterns are supported:
Literal Patterns
~~~~~~~~~~~~~~~~
Simplified syntax::
literal_pattern:
| number
| string
| 'None'
| 'True'
| 'False'
A literal pattern consists of a simple literal like a string, a number,
a Boolean literal (``True`` or ``False``), or ``None``::
@ -427,6 +500,10 @@ really literals).
Capture Patterns
~~~~~~~~~~~~~~~~
Simplified syntax::
capture_pattern: NAME
A capture pattern serves as an assignment target for the matched expression::
match greeting:
@ -449,30 +526,51 @@ the ``""`` case clause was taken::
... # but works fine if greeting was not empty
While matching against each case clause, a name may be bound at most
once, having two capture patterns with coinciding names is an error. An
exception is made for the special single underscore (``_``) name; in
patterns, it's a wildcard that *never* binds::
once, having two capture patterns with coinciding names is an error::
match data:
case [x, x]: # Error!
...
case [_, _]:
print("Some pair")
print(_) # Error!
Note: one can still match on a collection with equal items using `guards`_.
Also, ``[x, y] | Point(x, y)`` is a legal pattern because the two
alternatives are never matched at the same time.
The single underscore (``_``) is not considered a ``NAME`` and treated specially
as a `wildcard pattern`_.
Reminder: ``None``, ``False`` and ``True`` are keywords denoting
literals, not names.
.. _wildcard_pattern:
Wildcard Pattern
~~~~~~~~~~~~~~~~
Simplified syntax::
wildcard_pattern: "_"
The single underscore (``_``) name is a special kind of pattern that always
matches but *never* binds::
match data:
case [_, _]:
print("Some pair")
print(_) # Error!
Given that no binding is made, it can be used as many times as desired, unlike
capture patterns.
.. _constant_value_pattern:
Constant Value Patterns
~~~~~~~~~~~~~~~~~~~~~~~
Simplified syntax::
constant_pattern: NAME ('.' NAME)+
This is used to match against constants and enum values.
Every dotted name in a pattern is looked up using normal Python name
resolution rules, and the value is used for comparison by equality with
@ -502,6 +600,14 @@ considered for constant value patterns.
Sequence Patterns
~~~~~~~~~~~~~~~~~
Simplified syntax::
sequence_pattern:
| '[' [values_pattern] ']'
| '(' [value_pattern ',' [values pattern]] ')'
values_pattern: ','.value_pattern+ ','?
value_pattern: '*' capture_pattern | pattern
A sequence pattern follows the same semantics as unpacking assignment.
Like unpacking assignment, both tuple-like and list-like syntax can be
used, with identical semantics. Each element can be an arbitrary
@ -533,6 +639,15 @@ example:
Mapping Patterns
~~~~~~~~~~~~~~~~
Simplified syntax::
mapping_pattern: '{' [items_pattern] '}'
items_pattern: ','.key_value_pattern+ ','?
key_value_pattern:
| (literal_pattern | constant_pattern) ':' or_pattern
| '**' capture_pattern
Mapping pattern is a generalization of iterable unpacking to mappings.
Its syntax is similar to dictionary display but each key and value are
patterns ``"{" (pattern ":" pattern)+ "}"``. A ``**name`` pattern is also
@ -568,6 +683,16 @@ were already present when the ``match`` block was entered.
Class Patterns
~~~~~~~~~~~~~~
Simplified syntax::
class_pattern:
| name_or_attr '(' ')'
| name_or_attr '(' ','.pattern+ ','? ')'
| name_or_attr '(' ','.keyword_pattern+ ','? ')'
| name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'
keyword_pattern: NAME '=' or_pattern
A class pattern provides support for destructuring arbitrary objects.
There are two possible ways of matching on object attributes: by position
like ``Point(1, 2)``, and by name like ``Point(x=1, y=2)``. These
@ -594,7 +719,7 @@ The leading name must not be ``_``, so e.g. ``_(...)`` and
matched object has an attribute ``foo``.
By default, sub-patterns may only be matched by keyword for
user-defined classes. In order to suport positional sub-patterns, a
user-defined classes. In order to support positional sub-patterns, a
custom ``__match_args__`` attribute is required.
The runtime allows matching against
arbitrarily nested patterns by chaining all of the instance checks and
@ -640,7 +765,7 @@ the same set of variables (excluding ``_``). For example::
Guards
------
Each *top-level* pattern can be followed by a guard of the form
Each *top-level* pattern can be followed by a **guard** of the form
``if expression``. A case clause succeeds if the pattern matches and the guard
evaluates to a true value. For example::
@ -700,7 +825,7 @@ match statements, but this will be less readable and/or will produce less
efficient code. Essentially, most of the arguments in PEP 572 apply here
equally.
``_`` is not a valid name here.
The wildcard ``_`` is not a valid name here.
.. _runtime:
@ -1940,7 +2065,6 @@ We are grateful for the help of the following individuals (among many
others) for helping out during various phases of the writing of this
PEP:
- Daniel F Moisset
- Taine Zhao
- Nate Lust
@ -1959,8 +2083,9 @@ Version History
- Why we choose ``_`` for wildcard patterns
- Why we choose ``|`` for OR patterns
- Why we choose not to use special syntax for capture variables
- Why this pattern matching operation and not others
- Clarify exception semantics
- Clarify exception and side effect semantics
- Clarify partial binding semantics
- Drop restriction on use of ``_`` in load contexts
- Simplify behavior of ``__match_args__``
@ -1968,7 +2093,11 @@ Version History
- Drop ``ImpossibleMatchError`` exception
- Drop leading dot for loads (moved to `deferred ideas`_)
- Reworked the initial sections (everything before `syntax`_)
- Added an overview of all the types of patterns before the
detailed description
- Added simplified syntax next to the description of each pattern
- Separate description of the wildcard from capture patterns
- Added Daniel F Moisset as sixth co-author
References
==========