python-peps/pep-0642.rst

2117 lines
88 KiB
ReStructuredText
Raw Normal View History

PEP: 642
Title: Explicit Pattern Syntax for Structural Pattern Matching
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
BDFL-Delegate:
Discussions-To: Python-Dev <python-dev@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Requires: 634
Created: 26-Sep-2020
Python-Version: 3.10
Post-History: 31-Oct-2020, 8-Nov-2020, 3-Jan-2021
Resolution:
Abstract
========
This PEP covers an alternative syntax proposal for PEP 634's structural pattern
matching that requires explicit prefixes on all capture patterns and value
constraints. It also proposes a new dedicated syntax for instance attribute
patterns that aligns more closely with the proposed mapping pattern syntax.
While the result is necessarily more verbose than the proposed syntax in
PEP 634, it is still significantly less verbose than the status quo.
As an example, the following match statement would extract "host" and "port"
details from a 2 item sequence, a mapping with "host" and "port" keys, any
object with "host" and "port" attributes, or a "host:port" string, treating
the "port" as optional in the latter three cases::
port = DEFAULT_PORT
match expr:
case [as host, as port]:
pass
case {"host" as host, "port" as port}:
pass
case {"host" as host}:
pass
case object{.host as host, .port as port}:
pass
case object{.host as host}:
pass
case str{} as addr:
host, __, optional_port = addr.partition(":")
if optional_port:
port = optional_port
case __ as m:
raise TypeError(f"Unknown address format: {m!r:.200}")
port = int(port)
At a high level, this PEP proposes to categorise the different available pattern
types as follows:
* wildcard pattern: ``__``
* group patterns: ``(PTRN)``
* value constraint patterns:
* equality constraints: ``== EXPR``
* identity contraints: ``is EXPR``
* structural constraint patterns:
* sequence constraint patterns: ``[PTRN, as NAME, PTRN as NAME]``
* mapping constraint patterns: ``{EXPR: PTRN, EXPR as NAME}``
* instance attribute constraint patterns:
``CLS{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME}``
* class defined constraint patterns:
``CLS(PTRN, PTRN, **{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME})``
* OR patterns: ``PTRN | PTRN | PTRN``
* AS patterns: ``PTRN as NAME`` (omitting the pattern implies ``__``)
The intent of this approach is to:
* allow an initial form of pattern matching to be developed and released without
needing to decide up front on the best default options for handling bare names,
attribute lookups, and literal values
* ensure that pattern matching is defined explicitly at the Abstract Syntax Tree
level, allowing the specifications of the semantics and the surface syntax for
pattern matching to be clearly separated
* define a clear and concise "ducktyping" syntax that could potentially be
adopted in ordinary expressions as a way to more easily retrieve a tuple
containing multiple attributes from the same object
Relative to PEP 634, the proposal also deliberately eliminates any syntax that
"binds to the right" without using the ``as`` keyword (using capture patterns
in PEP 634's mapping patterns and class patterns) or binds to both the left and
the right in the same pattern (using PEP 634's capture patterns with AS patterns)
Relationship with other PEPs
============================
This PEP both depends on and competes with PEP 634 - the PEP author agrees that
match statements would be a sufficiently valuable addition to the language to
be worth the additional complexity that they add to the learning process, but
disagrees with the idea that "simple name vs literal or attribute lookup"
really offers an adequate syntactic distinction between name binding and value
lookup operations in match patterns (at least for Python).
This PEP agrees with the spirit of PEP 640 (that the chosen wildcard pattern to
skip a name binding should be supported everywhere, not just in match patterns),
but is now proposing a different spelling for the wildcard syntax (``__`` rather
than ``?``). As such, it competes with PEP 640 as written, but would complement
a proposal to deprecate the use of ``__`` as an ordinary identifier and instead
turn it into a general purpose wildcard marker that always skips making a new
local variable binding.
While it has not yet been put forward as a PEP, Mark Shannon has a pre-PEP draft
[8_] expressing several concerns about the runtime semantics of the pattern
matching proposal in PEP 634. This PEP is somewhat complementary to that one, as
even though this PEP is mostly about surface syntax changes rather than major
semantic changes, it does propose that the Abstract Syntax Tree definition be
made more explicit to better separate the details of the surface syntax from the
semantics of the code generation step. There is one specific idea in that pre-PEP
draft that this PEP explicitly rejects: the idea that the different kinds of
matching are mutually exclusive. It's entirely possible for the same value to
match different kinds of structural pattern, and which one takes precedence will
intentionally be governed by the order of the cases in the match statement.
Motivation
==========
The original PEP 622 (which was later split into PEP 634, PEP 635, and PEP 636)
incorporated an unstated but essential assumption in its syntax design: that
neither ordinary expressions *nor* the existing assignment target syntax provide
an adequate foundation for the syntax used in match patterns.
While the PEP didn't explicitly state this assumption, one of the PEP authors
explained it clearly on python-dev [1_]:
The actual problem that I see is that we have different cultures/intuitions
fundamentally clashing here. In particular, so many programmers welcome
pattern matching as an "extended switch statement" and find it therefore
strange that names are binding and not expressions for comparison. Others
argue that it is at odds with current assignment statements, say, and
question why dotted names are _/not/_ binding. What all groups seem to
have in common, though, is that they refer to _/their/_ understanding and
interpretation of the new match statement as 'consistent' or 'intuitive'
--- naturally pointing out where we as PEP authors went wrong with our
design.
But here is the catch: at least in the Python world, pattern matching as
proposed by this PEP is an unprecedented and new way of approaching a common
problem. It is not simply an extension of something already there. Even
worse: while designing the PEP we found that no matter from which angle you
approach it, you will run into issues of seeming 'inconsistencies' (which is
to say that pattern matching cannot be reduced to a 'linear' extension of
existing features in a meaningful way): there is always something that goes
fundamentally beyond what is already there in Python. That's why I argue
that arguments based on what is 'intuitive' or 'consistent' just do not
make sense _/in this case/_.
The first iteration of this PEP was then born out of an attempt to show that the
second assertion was not accurate, and that match patterns could be treated
as a variation on assignment targets without leading to inherent contradictions.
(An earlier PR submitted to list this option in the "Rejected Ideas" section
of the original PEP 622 had previously been declined [2_]).
However, the review process for this PEP strongly suggested that not only did
the contradictions that Tobias mentioned in his email exist, but they were also
concerning enough to cast doubts on the syntax proposal presented in PEP 634.
Accordingly, this PEP was changed to go even further than PEP 634, and largely
abandon alignment between the sequence matching syntax and the existing iterable
unpacking syntax (effectively answering "Not really, as least as far as the
exact syntax is concerned" to the first question raised in the DLS'20 paper
[9_]: "Can we extend a feature like iterable unpacking to work for more general
object and data layouts?").
This resulted in a complete reversal of the goals of the PEP: rather than
attempting to emphasise the similarities between assignment and pattern matching,
the PEP now attempts to make sure that assignment target syntax isn't being
reused *at all*, reducing the likelihood of incorrect inferences being drawn
about the new construct based on experience with existing ones.
Finally, before completing the 3rd iteration of the proposal (which dropped
inferred patterns entirely), the PEP author spent quite a bit of time reflecting
on the following entries in PEP 20:
* Explicit is better than implicit.
* Special cases aren't special enough to break the rules.
* In the face of ambiguity, refuse the temptation to guess.
If we start with an explicit syntax, we can always add syntactic shortcuts later
(e.g. consider the recent proposals to add shortcuts for ``Union`` and
``Optional`` type hints only after years of experience with the original more
verbose forms), while if we start out with only the abbreviated forms,
then we don't have any real way to revisit those decisions in a future release.
Specification
=============
This PEP retains the overall `match`/`case` statement structure and semantics
from PEP 634, but proposes multiple changes that mean that user intent is
explicitly specified in the concrete syntax rather than needing to be inferred
from the pattern matching context.
In the proposed Abstract Syntax Tree, the semantics are also always explicit,
with no inference required.
The Match Statement
-------------------
Surface syntax::
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
subject_expr:
| star_named_expression ',' star_named_expressions?
| named_expression
case_block: "case" (guarded_pattern | open_pattern) ':' block
guarded_pattern: closed_pattern 'if' named_expression
open_pattern:
| as_pattern
| or_pattern
closed_pattern:
| wildcard_pattern
| group_pattern
| structural_constraint
Abstract syntax::
Match(expr subject, match_case* cases)
match_case = (pattern pattern, expr? guard, stmt* body)
The rules ``star_named_expression``, ``star_named_expressions``,
``named_expression`` and ``block`` are part of the `standard Python
grammar <https://docs.python.org/3.10/reference/grammar.html>`_.
Open patterns are patterns which consist of multiple tokens, and aren't
necessarily terminated by a closing delimiter (for example, ``__ as x``,
``int() | bool()``). To avoid ambiguity for human readers, their usage is
restricted to top level patterns and to group patterns (which are patterns
surrounded by parentheses).
Closed patterns are patterns which either consist of a single token
(i.e. ``__``), or else have a closing delimeter as a required part of their
syntax (e.g. ``[as x, as y]``, ``object{.x as x, .y as y}``).
As in PEP 634, the ``match`` and ``case`` keywords are soft keywords, i.e. they
are not reserved words in other grammatical contexts (including at the
start of a line if there is no colon where expected). This means
that they are recognized as keywords when part of a match
statement or case block only, and are allowed to be used in all
other contexts as variable or argument names.
Unlike PEP 634, patterns are explicitly defined as a new kind of node in the
abstract syntax tree - even when surface syntax is shared with existing
expression nodes, a distinct abstract node is emitted by the parser.
For context, ``match_stmt`` is a new alternative for
``compound_statement`` in the surface syntax and ``Match`` is a new
alternative for ``stmt`` in the abstract syntax.
Match Semantics
^^^^^^^^^^^^^^^
This PEP largely retains the overall pattern matching semantics proposed in
PEP 634.
The proposed syntax for patterns changes significantly, and is discussed in
detail below.
There are also some proposed changes to the semantics of class defined
constraints (class patterns in PEP 634) to eliminate the need to special case
any builtin types (instead, the introduction of dedicated syntax for instance
attribute constraints allows the behaviour needed by those builtin types to be
specified as applying to any type that sets ``__match_args__`` to ``None``)
.. _guards:
Guards
^^^^^^
This PEP retains the guard clause semantics proposed in PEP 634.
However, the syntax is changed slightly to require that when a guard clause
is present, the case pattern must be a *closed* pattern.
This makes it clearer to the reader where the pattern ends and the guard clause
begins. (This is mainly a potential problem with OR patterns, where the guard
clause looks kind of like the start of a conditional expression in the final
pattern. Actually doing that isn't legal syntax, so there's no ambiguity as far
as the compiler is concerned, but the distinction may not be as clear to a human
reader)
Irrefutable case blocks
^^^^^^^^^^^^^^^^^^^^^^^
The definition of irrefutable case blocks changes slightly in this PEP relative
to PEP 634, as capture patterns no longer exist as a separate concept from
AS patterns.
Aside from that caveat, the handling of irrefutable cases is the same as in
PEP 634:
* wildcard patterns are irrefutable
* AS patterns whose left-hand side is irrefutable
* OR patterns containing at least one irrefutable pattern
* parenthesized irrefutable patterns
* a case block is considered irrefutable if it has no guard and its
pattern is irrefutable.
* a match statement may have at most one irrefutable case block, and it
must be last.
.. _patterns:
Patterns
--------
The top-level surface syntax for patterns is as follows::
open_pattern: # Pattern may use multiple tokens with no closing delimiter
| as_pattern
| or_pattern
as_pattern: [closed_pattern] pattern_as_clause
or_pattern: '|'.simple_pattern+
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
| closed_pattern
| value_constraint
closed_pattern: # Require a single token or a closing delimiter in pattern
| wildcard_pattern
| group_pattern
| structural_constraint
As described above, the usage of open patterns is limited to top level case
clauses and when parenthesised in a group pattern.
The abstract syntax for patterns explicitly indicates which elements are
subpatterns and which elements are subexpressions or identifiers::
pattern = MatchAlways
| MatchValue(matchop op, expr value)
| MatchSequence(pattern* patterns)
| MatchMapping(expr* keys, pattern* patterns)
| MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
| MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
| MatchRestOfSequence(identifier? target)
-- A NULL entry in the MatchMapping key list handles capturing extra mapping keys
| MatchAs(pattern? pattern, identifier target)
| MatchOr(pattern* patterns)
AS Patterns
^^^^^^^^^^^
Surface syntax::
as_pattern: [closed_pattern] pattern_as_clause
pattern_as_clause: 'as' pattern_capture_target
pattern_capture_target: !"__" NAME !('.' | '(' | '=')
(Note: the name on the right may not be ``__``.)
Abstract syntax::
MatchAs(pattern? pattern, identifier target)
An AS pattern matches the closed pattern on the left of the ``as``
keyword against the subject. If this fails, the AS pattern fails.
Otherwise, the AS pattern binds the subject to the name on the right
of the ``as`` keyword and succeeds.
If no pattern to match is given, the wildcard pattern (``__``) is implied.
To avoid confusion with the `wildcard pattern`_, the double underscore (``__``)
is not permitted as a capture target (this is what ``!"__"`` expresses).
A capture pattern always succeeds. It binds the subject value to the
name using the scoping rules for name binding established for named expressions
in PEP 572. (Summary: the name becomes a local
variable in the closest containing function scope unless there's an
applicable ``nonlocal`` or ``global`` statement.)
In a given pattern, a given name may be bound only once. This
disallows for example ``case [as x, as x]: ...`` but allows
``case [as x] | (as x)``:
As an open pattern, the usage of AS patterns is limited to top level case
clauses and when parenthesised in a group pattern. However, several of the
structural constraints allow the use of ``pattern_as_clause`` in relevant
locations to bind extracted elements of the matched subject to local variables.
These are mostly represented in the abstract syntax tree as ``MatchAs`` nodes,
aside from the dedicated ``MatchRestOfSequence`` node in sequence patterns.
OR Patterns
^^^^^^^^^^^
Surface syntax::
or_pattern: '|'.simple_pattern+
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
| closed_pattern
| value_constraint
Abstract syntax::
MatchOr(pattern* patterns)
When two or more patterns are separated by vertical bars (``|``),
this is called an OR pattern. (A single simple pattern is just that)
Only the final subpattern may be irrefutable.
Each subpattern must bind the same set of names.
An OR pattern matches each of its subpatterns in turn to the subject,
until one succeeds. The OR pattern is then deemed to succeed.
If none of the subpatterns succeed the OR pattern fails.
Subpatterns are mostly required to be closed patterns, but the parentheses may
be omitted for value constraints.
.. _value_constraints:
Value constraints
^^^^^^^^^^^^^^^^^
Surface syntax::
value_constraint:
| eq_constraint
| id_constraint
eq_constraint: '==' closed_expr
id_constraint: 'is' closed_expr
closed_expr: # Require a single token or a closing delimiter in expression
| primary
| closed_factor
closed_factor: # "factor" is the main grammar node for these unary ops
| '+' primary
| '-' primary
| '~' primary
Abstract syntax::
MatchValue(matchop op, expr value)
matchop = EqCheck | IdCheck
The rule ``primary`` is defined in the standard Python grammar, and only
allows expressions that either consist of a single token, or else are required
to end with a closing delimiter.
Value constraints replace PEP 634's literal patterns and value patterns.
Equality constraints are written as ``== EXPR``, while identity constraints are
written as ``is EXPR``.
An equality constraint succeeds if the subject value compares equal to the
value given on the right, while an identity constraint succeeds only if they are
the exact same object.
The expressions to be compared against are largely restricted to either
single tokens (e.g. names, strings, numbers, builtin constants), or else to
expressions that are required to end with a closing delimiter.
The use of the high precedence unary operators is also permitted, as the risk of
perceived ambiguity is low, and being able to specify negative numbers without
parentheses is desirable.
When the same constraint expression occurs multiple times in the same match
statement, the interpreter may cache the first value calculated and reuse it,
rather than repeat the expression evaluation. (As for PEP 634 value patterns,
this cache is strictly tied to a given execution of a given match statement.)
Unlike literal patterns in PEP 634, this PEP requires that complex
literals be parenthesised to be accepted by the parser. See the Deferred
Ideas section for discussion on that point.
If this PEP were to be adopted in preference to PEP 634, then all literal and
value patterns would instead be written more explicitly as value constraints::
# Literal patterns
match number:
case == 0:
print("Nothing")
case == 1:
print("Just one")
case == 2:
print("A couple")
case == -1:
print("One less than nothing")
case == (1-1j):
print("Good luck with that...")
# Additional literal patterns
match value:
case == True:
print("True or 1")
case == False:
print("False or 0")
case == None:
print("None")
case == "Hello":
print("Text 'Hello'")
case == b"World!":
print("Binary 'World!'")
# Matching by identity rather than equality
SENTINEL = object()
match value:
case is True:
print("True, not 1")
case is False:
print("False, not 0")
case is None:
print("None, following PEP 8 comparison guidelines")
case is ...:
print("May be useful when writing __getitem__ methods?")
case is SENTINEL:
print("Matches the sentinel by identity, not just value")
# Matching against variables and attributes
from enum import Enum
class Sides(str, Enum):
SPAM = "Spam"
EGGS = "eggs"
...
preferred_side = Sides.EGGS
match entree[-1]:
case == Sides.SPAM: # Compares entree[-1] == Sides.SPAM.
response = "Have you got anything without Spam?"
case == preferred_side: # Compares entree[-1] == preferred_side
response = f"Oh, I love {preferred_side}!"
case as side: # Assigns side = entree[-1].
response = f"Well, could I have their Spam instead of the {side} then?"
Note the ``== preferred_side`` example: using an explicit prefix marker on
constraint expressions removes the restriction to only working with attributes
or literals for value lookups.
The ``== (1-1j)`` example illustrates the use of parentheses to turn any
subexpression into a closed one.
.. _wildcard_pattern:
Wildcard Pattern
^^^^^^^^^^^^^^^^
Surface syntax::
wildcard_pattern: "__"
Abstract syntax::
MatchAlways
A wildcard pattern always succeeds. As in PEP 634, it binds no name.
Where PEP 634 chooses the single underscore as its wildcard pattern for
consistency with other languages, this PEP chooses the double underscore as that
has a clearer path towards potentially being made consistent across the entire
language, whereas that path is blocked for ``"_"`` by i18n related use cases.
Example usage::
match sequence:
case [__]: # any sequence with a single element
return True
case [start, *__, end]: # a sequence with at least two elements
return start == end
case __: # anything
return False
Group Patterns
^^^^^^^^^^^^^^
Surface syntax::
group_pattern: '(' open_pattern ')'
For the syntax of ``open_pattern``, see Patterns above.
A parenthesized pattern has no additional syntax and is not represented in the
abstract syntax tree. It allows users to add parentheses around patterns to
emphasize the intended grouping, and to allow nesting of open patterns when the
grammar requires a closed pattern.
Unlike PEP 634, there is no potential ambiguity with sequence patterns, as
this PEP requires that all sequence patterns be written with square brackets.
Structural constraints
^^^^^^^^^^^^^^^^^^^^^^
Surface syntax::
structural_constraint:
| sequence_constraint
| mapping_constraint
| attrs_constraint
| class_constraint
Note: the separate "structural constraint" subcategory isn't used in the
abstract syntax tree, it's merely used as a convenient grouping node in the
surface syntax definition.
Structural constraints are patterns used to both make assertions about complex
objects and to extract values from them.
These patterns may all bind multiple values, either through the use of nested
AS patterns, or else through the use of ``pattern_as_clause`` elements included
in the definition of the pattern.
Sequence constraints
^^^^^^^^^^^^^^^^^^^^
Surface syntax::
sequence_constraint: '[' [sequence_constraint_elements] ']'
sequence_constraint_elements: ','.sequence_constraint_element+ ','?
sequence_constraint_element:
| star_pattern
| simple_pattern
| pattern_as_clause
star_pattern: '*' (pattern_as_clause | wildcard_pattern)
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
| closed_pattern
| value_constraint
pattern_as_clause: 'as' pattern_capture_target
Abstract syntax::
MatchSequence(pattern* patterns)
MatchRestOfSequence(identifier? target)
Sequence constraints allow items within a sequence to be checked and
optionally extracted.
A sequence pattern fails if the subject value is not an instance of
``collections.abc.Sequence``. It also fails if the subject value is
an instance of ``str``, ``bytes`` or ``bytearray`` (see Deferred Ideas for
a discussion on potentially removing the need for this special casing).
A sequence pattern may contain at most one star subpattern. The star
subpattern may occur in any position and is represented in the AST using the
``MatchRestOfSequence`` node.
If no star subpattern is present, the sequence pattern is a fixed-length
sequence pattern; otherwise it is a variable-length sequence pattern.
A fixed-length sequence pattern fails if the length of the subject
sequence is not equal to the number of subpatterns.
A variable-length sequence pattern fails if the length of the subject
sequence is less than the number of non-star subpatterns.
The length of the subject sequence is obtained using the builtin
``len()`` function (i.e., via the ``__len__`` protocol). However, the
interpreter may cache this value in a similar manner as described for
value constraint expressions.
A fixed-length sequence pattern matches the subpatterns to
corresponding items of the subject sequence, from left to right.
Matching stops (with a failure) as soon as a subpattern fails. If all
subpatterns succeed in matching their corresponding item, the sequence
pattern succeeds.
A variable-length sequence pattern first matches the leading non-star
subpatterns to the corresponding items of the subject sequence, as for
a fixed-length sequence. If this succeeds, the star subpattern
matches a list formed of the remaining subject items, with items
removed from the end corresponding to the non-star subpatterns
following the star subpattern. The remaining non-star subpatterns are
then matched to the corresponding subject items, as for a fixed-length
sequence.
Subpatterns are mostly required to be closed patterns, but the parentheses may
be omitted for value constraints. Sequence elements may also be captured
unconditionally without parentheses.
Note: where PEP 634 allows all the same syntactic flexibility as iterable
unpacking in assignment statements, this PEP restricts sequence patterns
specifically to the square bracket form. Given that the open and parenthesised
forms are far more popular than square brackets for iterable unpacking, this
helps emphasise that iterable unpacking and sequence matching are *not* the
same operation. It also avoids the parenthesised form's ambiguity problem
between single element sequence patterns and group patterns.
Mapping constraints
^^^^^^^^^^^^^^^^^^^
Surface syntax::
mapping_constraint: '{' [mapping_constraint_elements] '}'
mapping_constraint_elements: ','.key_value_constraint+ ','?
key_value_constraint:
| closed_expr pattern_as_clause
| closed_expr ':' simple_pattern
| double_star_capture
double_star_capture: '**' pattern_as_clause
(Note that ``**__`` is deliberately disallowed by this syntax, as additional
mapping entries are ignored by default)
closed_expr is defined above, under value constraints.
Abstract syntax::
MatchMapping(expr* keys, pattern* patterns)
Mapping constraints allow keys and values within a sequence to be checked and
values to optionally be extracted.
A mapping pattern fails if the subject value is not an instance of
``collections.abc.Mapping``.
A mapping pattern succeeds if every key given in the mapping pattern
is present in the subject mapping, and the pattern for
each key matches the corresponding item of the subject mapping.
The presence of keys is checked using the two argument form of the ``get``
method and a unique sentinel value, which offers the following benefits:
* no exceptions need to be created in the lookup process
* mappings that implement ``__missing__`` (such as ``collections.defaultdict``)
only match on keys that they already contain, they don't implicitly add keys
A mapping pattern may not contain duplicate key values. If duplicate keys are
detected when checking the mapping pattern, the pattern is considered invalid,
and a ``ValueError`` is raised. While it would theoretically be possible to
checked for duplicated constant keys at compile time, no such check is currently
defined or implemented.
(Note: This semantic description is derived from the PEP 634 reference
implementation, which differs from the PEP 634 specification text at time of
writing. The implementation seems reasonable, so amending the PEP text seems
like the best way to resolve the discrepancy)
If a ``'**' as NAME`` double star pattern is present, that name is bound to a
``dict`` containing any remaining key-value pairs from the subject mapping
(the dict will be empty if there are no additional key-value pairs).
A mapping pattern may contain at most one double star pattern,
and it must be last.
Value subpatterns are mostly required to be closed patterns, but the parentheses
may be omitted for value constraints (the ``:`` key/value separator is still
required to ensure the entry doesn't look like an ordinary comparison operation).
Mapping values may also be captured unconditionally using the ``KEY as NAME``
form, without either parentheses or the ``:`` key/value separator.
Instance attribute constraints
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Surface syntax::
attrs_constraint:
| name_or_attr '{' [attrs_constraint_elements] '}'
attrs_constraint_elements: ','.attr_value_pattern+ ','?
attr_value_pattern:
| '.' NAME pattern_as_clause
| '.' NAME value_constraint
| '.' NAME ':' simple_pattern
| '.' NAME
Abstract syntax::
MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
Instance attribute constraints allow an instance's type to be checked and
attributes to optionally be extracted.
An instance attribute constraint may not repeat the same attribute name multiple
times. Attempting to do so will result in a syntax error.
An instance attribute pattern fails if the subject is not an instance of
``name_or_attr``. This is tested using ``isinstance()``.
If ``name_or_attr`` is not an instance of the builtin ``type``,
``TypeError`` is raised.
If no attribute subpatterns are present, the constraint succeeds if the
``isinstance()`` check succeeds. Otherwise:
- Each given attribute name is looked up as an attribute on the subject.
- If this raises an exception other than ``AttributeError``,
the exception bubbles up.
- If this raises ``AttributeError`` the constraint fails.
- Otherwise, the subpattern associated with the keyword is matched
against the attribute value. If no subpattern is specified, the wildcard
pattern is assumed. If this fails, the constraint fails.
If it succeeds, the match proceeds to the next attribute.
- If all attribute subpatterns succeed, the constraint as a whole succeeds.
Instance attribute constraints allow ducktyping checks to be implemented by
using ``object`` as the required instance type (e.g.
``case object{.host as host, .port as port}:``).
The syntax being proposed here could potentially also be used as the basis for
a new syntax for retrieving multiple attributes from an object instance in one
assignment statement (e.g. ``host, port = addr{.host, .port}``). See the
Deferred Ideas section for further discussion of this point.
Class defined constraints
^^^^^^^^^^^^^^^^^^^^^^^^^
Surface syntax::
class_constraint:
| name_or_attr '(' ')'
| name_or_attr '(' positional_patterns ','? ')'
| name_or_attr '(' class_constraint_attrs ')'
| name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')'
positional_patterns: ','.positional_pattern+
positional_pattern:
| simple_pattern
| pattern_as_clause
class_constraint_attrs:
| '**' '{' [attrs_constraint_elements] '}'
Abstract syntax::
MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
Class defined constraints allow a sequence of common attributes to be
specified on a class and checked positionally, rather than needing to specify
the attribute names in every related match pattern.
As for instance attribute patterns:
- a class defined pattern fails if the subject is not an instance of
``name_or_attr``. This is tested using ``isinstance()``.
- if ``name_or_attr`` is not an instance of the builtin ``type``,
``TypeError`` is raised.
Regardless of whether or not any arguments are present, the subject is checked
for a ``__match_args__`` attribute using the equivalent of
``getattr(cls, "__match_args__", _SENTINEL))``.
If this raises an exception the exception bubbles up.
If the returned value is not a list, tuple, or ``None``, the conversion fails
and ``TypeError`` is raised at runtime.
This means that only types that actually define ``__match_args__`` will be
usable in class defined patterns. Types that don't define ``__match_args__``
will still be usable in instance attribute patterns.
If ``__match_args__`` is ``None``, then only a single positional subpattern is
permitted. Attempting to specify additional attribute patterns either
positionally or using the double star syntax will cause ``TypeError`` to be
raised at runtime.
This positional subpattern is then matched against the entire subject, allowing
a type check to be combined with another match pattern (e.g. checking both
the type and contents of a container, or the type and value of a number).
If ``__match_args__`` is a list or tuple, then the class defined constraint is
converted to an instance attributes constraint as follows:
- if only the double star attribute constraints subpattern is present, matching
proceeds as if for the equivalent instance attributes constraint.
- if there are more positional subpatterns than the length of
``__match_args__``` (as obtained using ``len()``), ``TypeError`` is raised.
- Otherwise, positional pattern ``i`` is converted to an attribute pattern
using ``__match_args__[i]`` as the attribute name.
- if any element in ``__match_args__`` is not a string, ``TypeError`` is raised.
- once the positional patterns have been converted to attribute patterns, then
they are combined with any atribute constraints given in the double star
attribute constraints subpattern, and matching proceeds as if for the
equivalent instance attributes constraint.
Note: the ``__match_args__ is None`` handling in this PEP replaces the special
casing of ``bool``, ``bytearray``, ``bytes``, ``dict``, ``float``,
``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple`` in PEP 634.
However, the optimised fast path for those types is retained in the
implementation.
Design Discussion
=================
Requiring explicit qualification of simple names in match patterns
------------------------------------------------------------------
The first iteration of this PEP accepted the basic premise of PEP 634 that
iterable unpacking syntax would provide a good foundation for defining a new
syntax for pattern matching.
During the review process, however, two major and one minor ambiguity problems
were highlighted that arise directly from that core assumption:
* most problematically, when binding simple names by default is extended to
PEP 634's proposed class pattern syntax, the ``ATTR=TARGET_NAME`` construct
binds to the right without using the ``as`` keyword, and uses the normal
assignment-to-the-left sigil (``=``) to do it!
* when binding simple names by default is extended to PEP 634's proposed mapping
pattern syntax, the ``KEY: TARGET_NAME`` construct binds to the right without
using the ``as`` keyword
* using a PEP 634 capture pattern together with an AS pattern
(``TARGET_NAME_1 as TARGET_NAME_2``) gives an odd "binds to both the left and
right" behaviour
The third revision of this PEP accounted for this problem by abandoning the
alignment with iterable unpacking syntax, and instead requiring that all uses
of bare simple names for anything other than a variable lookup be qualified by
a preceding sigil or keyword:
* ``as NAME``: local variable binding
* ``.NAME``: attribute lookup
* ``== NAME``: variable lookup
* ``is NAME``: variable lookup
* any other usage: variable lookup
The key benefit of this approach is that it makes interpretation of simple names
in patterns a local activity: a leading ``as`` indicates a name binding, a
leading ``.`` indicates an attribute lookup, and anything else is a variable
lookup (regardless of whether we're reading a subpattern or a subexpression).
With the syntax now proposed in this PEP, the problematic cases identified above
no longer read poorly:
* ``.ATTR as TARGET_NAME`` is more obviously a binding than ``ATTR=TARGET_NAME``
* ``KEY as TARGET_NAME`` is more obviously a binding than ``KEY: TARGET_NAME``
* ``(as TARGET_NAME_1) as TARGET_NAME_2`` is more obviously two bindings than
``TARGET_NAME_1 as TARGET_NAME_2``
Resisting the temptation to guess
---------------------------------
PEP 635 looks at the way pattern matching is used in other languages, and
attempts to use that information to make plausible predictions about the way
pattern matching will be used in Python:
* wanting to extract values to local names will *probably* be more common than
wanting to match against values stored in local names
* wanting comparison by equality will *probably* be more common than wanting
comparison by identity
* users will *probably* be able to at least remember that bare names bind values
and attribute references look up values, even if they can't figure that out
for themselves without reading the documentation or having someone tell them
To be clear, I think these predictions actually *are* plausible. However, I also
don't think we need to guess about this up front: I think we can start out with
a more explicit syntax that requires users to state their intent using a prefix
marker (either ``as``, ``==``, or ``is``), and then reassess the situation in a
few years based on how pattern matching is actually being used *in Python*.
At that point, we'll be able to choose amongst at least the following options:
* deciding the explicit syntax is concise enough, and not changing anything
* adding inferred identity constraints for one or more of ``None``, ``...``,
``True`` and ``False``
* adding inferred equality constraints for other literals (potentially including
complex literals)
* adding inferred equality constraints for attribute lookups
* adding either inferred equality constraints or inferred capture patterns for
bare names
All of those ideas could be considered independently on their own merits, rather
than being a potential barrier to introducing pattern matching in the first
place.
If any of these syntactic shortcuts were to eventually be introduced, they'd
also be straightforward to explain in terms of the underlying more explicit
syntax (the leading ``as``, ``==``, or ``is`` would just be getting inferred
by the parser, without the user needing to provide it explicitly). At the
implementation level, only the parser should need to be change, as the existing
AST nodes could be reused.
Interaction with caching of attribute lookups in local variables
----------------------------------------------------------------
One of the major changes between this PEP and PEP 634 is to use ``== EXPR``
for equality constraint lookups, rather than only offering ``NAME.ATTR``. The
original motivation for this was to avoid the semantic conflict with regular
assignment targets, where ``NAME.ATTR`` is already used in assignment statements
to set attributes, so if ``NAME.ATTR`` were the *only* syntax for symbolic value
matching, then we're pre-emptively ruling out any future attempts to allow
matching against single patterns using the existing assignment statement syntax.
The current motivation is more about the general desire to avoid guessing about
user's intent, and instead requiring them to state it explicitly in the syntax.
However, even within match statements themselves, the ``name.attr`` syntax for
value patterns has an undesirable interaction with local variable assignment,
where routine refactorings that would be semantically neutral for any other
Python statement introduce a major semantic change when applied to a PEP 634
style match statement.
Consider the following code::
while value < self.limit:
... # Some code that adjusts "value"
The attribute lookup can be safely lifted out of the loop and only performed
once::
_limit = self.limit:
while value < _limit:
... # Some code that adjusts "value"
With the marker prefix based syntax proposal in this PEP, value constraints
would be similarly tolerant of match patterns being refactored to use a local
variable instead of an attribute lookup, with the following two statements
being functionally equivalent::
match expr:
case {"key": == self.target}:
... # Handle the case where 'expr["key"] == self.target'
case __:
... # Handle the non-matching case
_target = self.target
match expr:
case {"key": == _target}:
... # Handle the case where 'expr["key"] == self.target'
case __:
... # Handle the non-matching case
By contrast, when using PEP 634's value and capture pattern syntaxes that omit
the marker prefix, the following two statements wouldn't be equivalent at all::
# PEP 634's value pattern syntax
match expr:
case {"key": self.target}:
... # Handle the case where 'expr["key"] == self.target'
case _:
... # Handle the non-matching case
# PEP 634's capture pattern syntax
_target = self.target
match expr:
case {"key": _target}:
... # Matches any mapping with "key", binding its value to _target
case _:
... # Handle the non-matching case
This PEP ensures the original semantics are retained under this style of
simplistic refactoring: use ``== name`` to force interpretation of the result
as a value constraint, use ``as name`` for a name binding.
PEP 634's proposal to offer only the shorthand syntax, with no explicitly
prefixed form, means that the primary answer on offer is "Well, don't do that,
then, only compare against attributes in namespaces, don't compare against
simple names".
PEP 622's walrus pattern syntax had another odd interaction where it might not
bind the same object as the exact same walrus expression in the body of the
case clause, but PEP 634 fixed that discrepancy by replacing walrus patterns
with AS patterns (where the fact that the value bound to the name on the RHS
might not be the same value as returned by the LHS is a standard feature common
to all uses of the "as" keyword).
Using existing comparison operators as the value constraint prefix
--------------------------------------------------------------------
If the benefit of a dedicated value constraint prefix is accepted, then the
next question is to ask exactly what that prefix should be.
The initially published version of this PEP proposed using the previously
unused ``?`` symbol as the prefix for equality constraints, and ``?is`` as the
prefix for identity constraints. When reviewing the PEP, Steven D'Aprano
presented a compelling counterproposal [5_] to use the existing comparison
operators (``==`` and ``is``) instead.
There were a few concerns with ``==`` as a prefix that kept it from being
chosen as the prefix in the initial iteration of the PEP:
* for common use cases, it's even more visually noisy than ``?``, as a lot of
folks with PEP 8 trained aesthetic sensibilities are going to want to put
a space between it and the following expression, effectively making it a 3
character prefix instead of 1
* when used in a mapping pattern, there needs to be a space between the ``:``
key/value separator and the ``==`` prefix, or the tokeniser will split them
up incorrectly (getting ``:=`` and ``=`` instead of ``:`` and ``==``)
* when used in an OR pattern, there needs to be a space between the ``|``
pattern separator and the ``==`` prefix, or the tokeniser will split them
up incorrectly (getting ``|=`` and ``=`` instead of ``|`` and ``==``)
* if used in a PEP 634 style class pattern, there needs to be a space between
the ``=`` keyword separator and the ``==`` prefix, or the tokeniser will split
them up incorrectly (getting ``==`` and ``=`` instead of ``=`` and ``==``)
Rather than introducing a completely new symbol, Steven's proposed resolution to
this verbosity problem was to retain the ability to omit the prefix marker in
syntactically unambiguous cases.
While the idea of omitting the prefix marker was accepted for the second
revision of the proposal, it was dropped again in the third revision due to
ambiguity concerns. Instead, the following points apply:
* for class patterns, other syntax changes allow equality constraints to be
written as ``.ATTR == EXPR``, and identity constraints to be written as
``.ATTR is EXPR``, both of which are quite easy to read
* for mapping patterns, the extra syntactic noise is just tolerated (at least
for now)
* for OR patterns, the extra syntactic noise is just tolerated (at least
for now). However, `membership constraints`_ may offer a future path to
reducing the need to combine OR patterns with equality constraints (instead,
the values to be checked against would be collected as a set, list, or tuple).
Given that perspective, PEP 635's arguments against using ``?`` as part of the
pattern matching syntax held for this proposal as well, and so the PEP was
amended accordingly.
Using ``__`` as the wildcard pattern marker
-------------------------------------------
PEP 635 makes a solid case that introducing ``?`` *solely* as a wildcard pattern
marker would be a bad idea. With the syntax for value constraints changed
to use existing comparison operations rather than ``?`` and ``?is``, that
argument holds for this PEP as well.
However, as noted by Thomas Wouters in [6_], PEP 634's choice of ``_`` remains
problematic as it would likely mean that match patterns would have a *permanent*
difference from all other parts of Python - the use of ``_`` in software
internationalisation and at the interactive prompt means that there isn't really
a plausible path towards using it as a general purpose "skipped binding" marker.
``__`` is an alternative "this value is not needed" marker drawn from a Stack
Overflow answer [7_] (originally posted by the author of this PEP) on the
various meanings of ``_`` in existing Python code.
This PEP also proposes adopting an implementation technique that limits
the scope of the associated special casing of ``__`` to the parser: defining a
new AST node type (``MatchAlways``) specifically for wildcard markers, rather
than passing it through to the AST as a ``Name`` node.
Within the parser, ``__`` still means either a regular name or a wildcard
marker in a match pattern depending on where you were in the parse tree, but
within the rest of the compiler, ``Name("__")`` is still a normal variable name,
while ``MatchAlways()`` is always a wildcard marker in a match pattern.
Unlike ``_``, the lack of other use cases for ``__`` means that there would be
a plausible path towards restoring identifier handling consistency with the rest
of the language by making ``__`` mean "skip this name binding" everwhere in
Python:
* in the interpreter itself, deprecate loading variables with the name ``__``.
This would make reading from ``__`` emit a deprecation warning, while writing
to it would initially be unchanged. To avoid slowing down all name loads, this
could be handled by having the compiler emit additional code for the
deprecated name, rather than using a runtime check in the standard name
loading opcodes.
* after a suitable number of releases, change the parser to emit
a new ``SkippedBinding`` AST node for all uses of ``__`` as an assignment
target, and update the rest of the compiler accordingly
* consider making ``__`` a true hard keyword rather than a soft keyword
This deprecation path couldn't be followed for ``_``, as there's no way for the
interpreter to distinguish between attempts to read back ``_`` when nominally
used as a "don't care" marker, and legitimate reads of ``_`` as either an
i18n text translation function or as the last statement result at the
interactive prompt.
Names starting with double-underscores are also already reserved for use by the
language, whether that is for compile time constants (i.e. ``__debug__``),
special methods, or class attribute name mangling, so using ``__`` here would
be consistent with that existing approach.
Representing patterns explicitly in the Abstract Syntax Tree
------------------------------------------------------------
PEP 634 doesn't explicitly discuss how match statements should be represented
in the Abstract Syntax Tree, instead leaving that detail to be defined as part
of the implementation.
As a result, while the reference implementation of PEP 634 definitely works (and
formed the basis of the reference implementation of this PEP), it does contain
a significant design flaw: despite the notes in PEP 635 that patterns should be
considered as distinct from expressions, the reference implementation goes ahead
and represents them in the AST as expression nodes.
The result is an AST that isn't very abstract at all: nodes that should be
compiled completely differently (because they're patterns rather than
expressions) are represented the same way, and the type system of the
implementation language (e.g. C for CPython) can't offer any assistance in
keeping track of which subnodes should be ordinary expressions and which should
be subpatterns.
Rather than continuing with that approach, this PEP has instead defined a new
explicit "pattern" node in the AST, which allows the patterns and their
permitted subnodes to be defined explicitly in the AST itself, making the code
implementing the new feature clearer, and allowing the C compiler to provide
more assistance in keeping track of when the code generator is dealing with
patterns or expressions.
This change in implementation approach is actually orthogonal to the surface
syntax changes proposed in this PEP, so it could still be adopted even if the
rest of the PEP were to be rejected.
Changes to sequence patterns
----------------------------
This PEP makes one notable change to sequence patterns relative to PEP 634:
* only the square bracket form of sequence pattern is supported. Neither open
(no delimeters) nor tuple style (parentheses as delimiters) sequence patterns
are supported.
Relative to PEP 634, sequence patterns are also significantly affected by the
change to require explicit qualification of capture patterns and value
constraints, as it means ``case [a, b, c]:`` must instead be written as
``case [as a, as b, as c]:`` and ``case [0, 1]:`` must instead be written as
``case [== 0, == 1]:``.
With the syntax for sequence patterns no longer being derived directly from the
syntax for iterable unpacking, it no longer made sense to keep the syntactic
flexibility that had been included in the original syntax proposal purely for
consistency with iterable unpacking.
Allowing open and tuple style sequence patterns didn't increase expressivity,
only ambiguity of intent (especially relative to group paterns), and encouraged
readers down the path of viewing pattern matching syntax as intrinsically linked
to assignment target syntax (which the PEP 634 authors have stated multiple
times is not a desirable path to have readers take, and a view the author of
this PEP now shares, despite disagreeing with it originally).
Changes to mapping patterns
---------------------------
This PEP makes two notable changes to mapping patterns relative to PEP 634:
* value capturing is written as ``KEY as NAME`` rather than as ``KEY: NAME``
* a wider range of keys are permitted: any "closed expression", rather than
only literals and attribute references
As discussed above, the first change is part of ensuring that all binding
operations with the target name to the right of a subexpression or pattern
use the ``as`` keyword.
The second change is mostly a matter of simplifying the parser and code
generator code by reusing the existing expression handling machinery. The
restriction to closed expressions is designed to help reduce ambiguity as to
where the key expression ends and the match pattern begins. This mostly allows
a superset of what PEP 634 allows, except that complex literals must be written
in parentheses (at least for now).
Adapting PEP 635's mapping pattern examples to the syntax proposed in this PEP::
match json_pet:
case {"type": == "cat", "name" as name, "pattern" as pattern}:
return Cat(name, pattern)
case {"type": == "dog", "name" as name, "breed" as breed}:
return Dog(name, breed)
case __:
raise ValueError("Not a suitable pet")
def change_red_to_blue(json_obj):
match json_obj:
case { 'color': (== 'red' | == '#FF0000') }:
json_obj['color'] = 'blue'
case { 'children' as children }:
for child in children:
change_red_to_blue(child)
For reference, the equivalent PEP 634 syntax::
match json_pet:
case {"type": "cat", "name": name, "pattern": pattern}:
return Cat(name, pattern)
case {"type": "dog", "name": name, "breed": breed}:
return Dog(name, breed)
case _:
raise ValueError("Not a suitable pet")
def change_red_to_blue(json_obj):
match json_obj:
case { 'color': ('red' | '#FF0000') }:
json_obj['color'] = 'blue'
case { 'children': children }:
for child in children:
change_red_to_blue(child)
Changes to class patterns
-------------------------
This PEP makes several notable changes to class patterns relative to PEP 634:
* the syntactic alignment with class instantiation is abandoned as being
actively misleading and unhelpful. Instead, a new dedicated syntax for
checking additional attributes is introduced that draws inspiration from
mapping patterns rather than class instantiation
* a new dedicated syntax for simple ducktyping that will work for any class
is introduced
* the special casing of various builtin and standard library types is
supplemented by a general check for the existence of a ``__match_args__``
attribute with the value of ``None``
As discussed above, the first change has two purposes:
* it's part of ensuring that all binding operations with the target name to the
right of a subexpression or pattern use the ``as`` keyword. Using ``=`` to
assign to the right is particularly problematic.
* it's part of ensuring that all uses of simple names in patterns have a prefix
that indicates their purpose (in this case, a leading ``.`` to indicate an
attribute lookup)
The syntactic alignment with class instantion was also judged to be unhelpful
in general, as class patterns are about matching patterns against attributes,
while class instantiation is about matching call arguments to parameters in
class constructors, which may not bear much resemblance to the resulting
instance attributes at all.
The second change is intended to make it easier to use pattern matching for the
"ducktyping" style checks that are already common in Python.
The concrete syntax proposal for these patterns then arose from viewing
instances as mappings of attribute names to values, and combining the attribute
lookup syntax (``.ATTR``), with the mapping pattern syntax ``{KEY: PATTERN}``
to give ``cls{.ATTR: PATTERN}``.
Allowing ``cls{.ATTR}`` to mean the same thing as ``cls{.ATTR: __}`` was a
matter of considering the leading ``.`` sufficient to render the name usage
unambiguous (it's clearly an attribute reference, whereas matching against a variable
key in a mapping pattern would be arguably ambiguous)
The final change just supplements a CPython-internal-only check in the PEP 634
reference implementation by making it the default behaviour that classes get if
they don't define ``__match_args__`` (the optimised fast path for the builtin
and standard library types named in PEP 634 is retained).
Adapting the class matching example
`linked from PEP 635 <https://github.com/gvanrossum/patma/blob/master/examples/expr.py#L231>`_
shows that for purely positional class matching, the main impact comes from the
changes to value constraints and name binding, not from the class matching
changes::
match expr:
case BinaryOp(== '+', as left, as right):
return eval_expr(left) + eval_expr(right)
case BinaryOp(== '-', as left, as right):
return eval_expr(left) - eval_expr(right)
case BinaryOp(== '*', as left, as right):
return eval_expr(left) * eval_expr(right)
case BinaryOp(== '/', as left, as right):
return eval_expr(left) / eval_expr(right)
case UnaryOp(== '+', as arg):
return eval_expr(arg)
case UnaryOp(== '-', as arg):
return -eval_expr(arg)
case VarExpr(as name):
raise ValueError(f"Unknown value of: {name}")
case float() | int():
return expr
case __:
raise ValueError(f"Invalid expression value: {repr(expr)}")
For reference, the equivalent PEP 634 syntax::
match expr:
case BinaryOp('+', left, right):
return eval_expr(left) + eval_expr(right)
case BinaryOp('-', left, right):
return eval_expr(left) - eval_expr(right)
case BinaryOp('*', left, right):
return eval_expr(left) * eval_expr(right)
case BinaryOp('/', left, right):
return eval_expr(left) / eval_expr(right)
case UnaryOp('+', arg):
return eval_expr(arg)
case UnaryOp('-', arg):
return -eval_expr(arg)
case VarExpr(name):
raise ValueError(f"Unknown value of: {name}")
case float() | int():
return expr
case _:
raise ValueError(f"Invalid expression value: {repr(expr)}")
The changes to the class pattern syntax itself are more relevant when
checking for named attributes and extracting their values without relying on
``__match_args__``::
match expr:
case object{.host as host, .port as port}:
pass
case object{.host as host}:
pass
Compare this to the PEP 634 equivalent, where it really isn't clear which names
are referring to attributes of the match subject and which names are referring
to local variables::
match expr:
case object(host=host, port=port):
pass
case object(host=host):
pass
In this specific case, that ambiguity doesn't matter (since the attribute and
variable names are the same), but in the general case, knowing which is which
will be critical to reasoning correctly about the code being read.
Deferred Ideas
==============
Inferred value constraints
--------------------------
As discussed above, this PEP doesn't rule out the possibility of adding
inferred equality and identity constraints in the future.
These could be particularly valuable for literals, as it is quite likely that
many "magic" strings and numbers with self-evident meanings will be written
directly into match patterns, rather than being stored in named variables.
(Think constants like ``None``, or obviously special numbers like ``0`` and
``1``, or strings where their contents are as descriptive as any variable name,
rather than cryptic checks against opaque numbers like ``739452``)
Making some required parentheses optional
-----------------------------------------
The PEP currently errs heavily on the side of requiring parentheses in the face
of potential ambiguity.
However, there are a number of cases where it at least arguably goes too far,
mostly involving AS patterns with an explicit pattern.
In any position that requires a closed pattern, AS patterns may end up starting
with doubled parentheses, as the nested pattern is also required to be a closed
pattern: ``((OPEN PTRN) as NAME)``
Due to the requirement that the subpattern be closed, it should be reasonable
in many of these cases (e.g. sequence pattern subpatterns) to accept
``CLOSED_PTRN as NAME`` directly.
Further consideration of this point has been deferred, as making required
parentheses optional is a backwards compatible change, and hence relaxing the
restrictions later can be considered on a case by case basis.
Accepting complex literals as closed expressions
------------------------------------------------
PEP 634's reference implementation includes a lot of special casing of binary
operations in both the parser and the rest of the compiler in order to accept
complex literals without accepting arbitrary binary numeric operations on
literal values.
Ideally, this problem would be dealt with at the parser layer, with the parser
directly emitting a Constant AST node prepopulated with a complex number. If
that was the way things worked, then complex literals could be accepted through
a similar mechanism to any other literal.
This isn't how complex literals are handled, however. Instead, they're passed
through to the AST as regular ``BinOp`` nodes, and then the constant folding
pass on the AST resolves them down to ``Constant`` nodes with a complex value.
For the parser to resolve complex literals directly, the compiler would need to
be able to tell the tokenizer to generate a distinct token type for
imaginary numbers (e.g. ``INUMBER``), which would then allow the parser to
handle ``NUMBER + INUMBER`` and ``NUMBER - INUMBER`` separately from other
binary operations.
Alternatively, a new ``ComplexNumber`` AST node type could be defined, which
would allow the parser to notify the subsequent compiler stages that a
particular node should specifically be a complex literal, rather than an
arbitrary binary operation. Then the parser could accept ``NUMBER + NUMBER``
and ``NUMBER - NUMBER`` for that node, while letting the AST validation for
``ComplexNumber`` take care of ensuring that the real and imaginary parts of
the literal were real and imaginary numbers as expected.
For now, this PEP has postponed dealing with this question, and instead just
requires that complex literals be parenthesised in order to be used in value
constraints and as mapping pattern keys.
Allowing negated constraints in match patterns
----------------------------------------------
With the syntax proposed in this PEP, it isn't permitted to write ``!= expr``
or ``is not expr`` as a match pattern.
Both of these forms have clear potential interpretations as a negated equality
constraint (i.e. ``x != expr``) and a negated identity constraint
(i.e. ``x is not expr``).
However, it's far from clear either form would come up often enough to justify
the dedicated syntax, so the possible extension has been deferred pending further
community experience with match statements.
.. _membership constraints:
Allowing membership checks in match patterns
---------------------------------------------
The syntax used for equality and identity constraints would be straightforward
to extend to membership checks: ``in container``.
One downside of the proposals in both this PEP and PEP 634 is that checking
for multiple values in the same case doesn't look like any existing container
membership check in Python::
# PEP 634's literal patterns
match value:
case 0 | 1 | 2 | 3:
...
# This PEP's equality constraints
match value:
case == 0 | == 1 | == 2 | == 3:
...
Allowing inferred equality contraints under this PEP would only make it look
like the PEP 634 example, it still wouldn't look like the equivalent ``if``
statement header (``if value in {0, 1, 2, 3}:``).
Membership constraints would provide a more explicit, but still concise, way
to check if the match subject was present in a container, and it would look
the same as an ordinary containment check::
match value:
case in {0, 1, 2, 3}:
...
case in {one, two, three, four}:
...
case in range(4): # It would accept any container, not just literal sets
...
Such a feature would also be readily extensible to allow all kinds of case
clauses without any further syntax updates, simply by defining ``__contains__``
appropriately on a custom class definition.
However, while this does seem like a useful extension, and a good way to resolve
this PEP's verbosity problem when combining multiple equality checks in an
OR pattern, it isn't essential to making match statements a valuable addition
to the language, so it seems more appropriate to defer it to a separate proposal,
rather than including it here.
Inferring a default type for instance attribute constraints
-----------------------------------------------------------
The dedicated syntax for instance attribute constraints means that ``object``
could be omitted from ``object{.ATTR}`` to give ``{.ATTR}`` without introducing
any syntactic ambiguity (if no class was given, ``object`` would be implied,
just as it is for the base class list in class definitions).
However, it's far from clear saving six characters is worth making it harder to
visually distinguish mapping patterns from instance attribute patterns, so
allowing this has been deferred as a topic for possible future consideration.
Avoiding special cases in sequence patterns
-------------------------------------------
Sequence patterns in both this PEP and PEP 634 currently special case ``str``,
``bytes``, and ``bytearray`` as specifically *never* matching a sequence
pattern.
This special casing could potentially be removed if we were to define a new
``collections.abc.AtomicSequence`` abstract base class for types like these,
where they're conceptually a single item, but still implement the sequence
protocol to allow random access to their component parts.
Expression syntax to retrieve multiple attributes from an instance
------------------------------------------------------------------
The instance attribute pattern syntax has been designed such that it could
be used as the basis for a general purpose syntax for retrieving multiple
attributes from an object in a single expression::
host, port = obj{.host, .port}
Similar to slice syntax only being allowed inside bracket subscrpts, the
``.attr`` syntax for naming attributes would only be allowed inside brace
subscripts.
This idea isn't required for pattern matching to be useful, so it isn't part of
this PEP. However, it's mentioned as a possible path towards making pattern
matching feel more integrated into the rest of the language, rather than
existing forever in its own completely separated world.
Expression syntax to retrieve multiple attributes from an instance
------------------------------------------------------------------
If the brace subscript syntax were to be accepted for instance attribute
pattern matching, and then subsequently extended to offer general purpose
extraction of multiple attributes, then it could be extended even further to
allow for retrieval of multiple items from containers based on the syntax
used for mapping pattern matching::
host, port = obj{"host", "port"}
first, last = obj{0, -1}
Again, this idea isn't required for pattern matching to be useful, so it isn't
part of this PEP. As with retrieving multiple attributes, however, it is
included as an example of the proposed pattern matching syntax inspiring ideas
for making object deconstruction easier in general.
Rejected Ideas
==============
Restricting permitted expressions in value constraints and mapping pattern keys
-------------------------------------------------------------------------------
While it's entirely technically possible to restrict the kinds of expressions
permitted in value constraints and mapping pattern keys to just attribute
lookups and constant literals (as PEP 634 does), there isn't any clear runtime
value in doing so, so this PEP proposes allowing any kind of primary expression
(primary expressions are an existing node type in the grammar that includes
things like literals, names, attribute lookups, function calls, container
subscripts, parenthesised groups, etc), as well as high precedence unary
operations (``+``, ``-``, ``~``) on primary expressions.
While PEP 635 does emphasise several times that literal patterns and value
patterns are not full expressions, it doesn't ever articulate a concrete benefit
that is obtained from that restriction (just a theoretical appeal to it being
useful to separate static checks from dynamic checks, which a code style
tool could still enforce, even if the compiler itself is more permissive).
The last time we imposed such a restriction was for decorator expressions and
the primary outcome of that was that users had to put up with years of awkward
syntactic workarounds (like nesting arbitrary expressions inside function calls
that just returned their argument) to express the behaviour they wanted before
the language definition was finally updated to allow arbitrary expressions and
let users make their own decisions about readability.
The situation in PEP 634 that bears a resemblance to the situation with decorator
expressions is that arbitrary expressions are technically supported in value
patterns, they just require awkward workarounds where either all the values to
match need to be specified in a helper class that is placed before the match
statement::
# Allowing arbitrary match targets with PEP 634's value pattern syntax
class mt:
value = func()
match expr:
case (_, mt.value):
... # Handle the case where 'expr[1] == func()'
Or else they need to be written as a combination of a capture pattern and a
guard expression::
# Allowing arbitrary match targets with PEP 634's guard expressions
match expr:
case (_, _matched) if _matched == func():
... # Handle the case where 'expr[1] == func()'
This PEP proposes skipping requiring any such workarounds, and instead
supporting arbitrary value constraints from the start::
match expr:
case (__, == func()):
... # Handle the case where 'expr == func()'
Whether actually writing that kind of code is a good idea would be a topic for
style guides and code linters, not the language compiler.
In particular, if static analysers can't follow certain kinds of dynamic checks,
then they can limit the permitted expressions at analysis time, rather than the
compiler restricting them at compile time.
There are also some kinds of expressions that are almost certain to give
nonsensical results (e.g. ``yield``, ``yield from``, ``await``) due to the
pattern caching rule, where the number of times the constraint expression
actually gets evaluated will be implementation dependent. Even here, the PEP
takes the view of letting users write nonsense if they really want to.
Aside from the recenty updated decorator expressions, another situation where
Python's formal syntax offers full freedom of expression that is almost never
used in practice is in ``except`` clauses: the exceptions to match against
almost always take the form of a simple name, a dotted name, or a tuple of
those, but the language grammar permits arbitrary expressions at that point.
This is a good indication that Python's user base can be trusted to
take responsibility for finding readable ways to use permissive language
features, by avoiding writing hard to read constructs even when they're
permitted by the compiler.
This permissiveness comes with a real concrete benefit on the implementation
side: dozens of lines of match statement specific code in the compiler is
replaced by simple calls to the existing code for compiling expressions
(including in the AST validation pass, the AST optimization pass, the symbol
table analysis pass, and the code generation pass). This implementation
benefit would accrue not just to CPython, but to every other Python
implementation looking to add match statement support.
Requiring the use of constraint prefix markers for mapping pattern keys
-----------------------------------------------------------------------
The initial (unpublished) draft of this proposal suggested requiring mapping
pattern keys be value constraints, just as PEP 634 requires that they be valid
literal or value patterns::
import constants
match config:
case {== "route": route}:
process_route(route)
case {== constants.DEFAULT_PORT: sub_config, **rest}:
process_config(sub_config, rest)
However, the extra characters were syntactically noisy and unlike its use in
value constraints (where it distinguishes them from non-pattern expressions),
the prefix doesn't provide any additional information here that isn't already
conveyed by the expression's position as a key within a mapping pattern.
Accordingly, the proposal was simplified to omit the marker prefix from mapping
pattern keys.
This omission also aligns with the fact that containers may incorporate both
identity and equality checks into their lookup process - they don't purely
rely on equality checks, as would be incorrectly implied by the use of the
equality constraint prefix.
Allowing the key/value separator to be omitted for mapping value constraints
----------------------------------------------------------------------------
Instance attribute patterns allow the ``:`` separator to be omitted when
writing attribute value constraints like ``case object{.attr == expr}``.
Offering a similar shorthand for mapping value constraints was considered, but
permitting it allows thoroughly baffling constructs like ``case {0 == 0}:``
where the compiler knows this is the key ``0`` with the value constraint
``== 0``, but a human reader sees the tautological comparison operation
``0 == 0``. With the key/value separator included, the intent is more obvious to
a human reader as well: ``case {0: == 0}:``
Reference Implementation
========================
A draft reference implementation for this PEP [3_] has been derived from Brandt
Bucher's reference implementation for PEP 634 [4_].
Relative to the text of this PEP, the draft reference implementation has not
yet complemented the special casing of several builtin and standard library
types in ``MATCH_CLASS`` with the more general check for ``__match_args__``
being set to ``None``. Class defined patterns also currenty still accept
classes that don't define ``__match_args__``.
All other modified patterns have been updated to follow this PEP rather than
PEP 634.
Unparsing for match patterns has not yet been migrated to the updated v3 AST.
The AST validator for match patterns has not yet been implemented.
The AST validator in general has not yet been reviewed to ensure that it is
checking that only expression nodes are being passed in where expression nodes
are expected.
The examples in this PEP have not yet been converted to test cases, so could
plausibly contain typos and other errors.
Several of the old PEP 634 tests are still to be converted to new SyntaxError
tests.
The documentation has not yet been updated.
Acknowledgments
===============
The PEP 622 and PEP 634/635/636 authors, as the proposal in this PEP is merely
an attempt to improve the readability of an already well-constructed idea by
proposing that starting with a more explicit syntax and potentially introducing
syntactic shortcuts for particularly common operations later is a better option
than attempting to *only* define the shortcut version. For areas of the
specification where the two PEPs are the same (or at least very similar), the
text describing the intended behaviour in this PEP is often derived directly
from the PEP 634 text.
Steven D'Aprano, who made a compelling case that the key goals of this PEP could
be achieved by using existing comparison tokens to tell the ability to override
the compiler when our guesses as to "what most users will want most of the time"
are inevitably incorrect for at least some users some of the time, and retaining
some of PEP 634's syntactic sugar (with a slightly different semantic definition)
to obtain the same level of brevity as PEP 634 in most situations. (Paul
Sokolosvsky also independently suggested using ``==`` instead of ``?`` as a
more easily understood prefix for equality constraints).
Thomas Wouters, whose publication of PEP 640 and public review of the structured
pattern matching proposals persuaded the author of this PEP to continue
advocating for a wildcard pattern syntax that a future PEP could plausibly turn
into a hard keyword that always skips binding a reference in any location a
simple name is expected, rather than continuing indefinitely as the match
pattern specific soft keyword that is proposed here.
Joao Bueno and Jim Jewett for nudging the PEP author to take a closer look at
the proposed syntax for subelement capturing within class patterns and mapping
patterns (particularly the problems with "capturing to the right"). This
review is what prompted the significant changes between v2 and v3 of the
proposal.
References
==========
.. [1] Post explaining the syntactic novelties in PEP 622
https://mail.python.org/archives/list/python-dev@python.org/message/2VRPDW4EE243QT3QNNCO7XFZYZGIY6N3/>
.. [2] Declined pull request proposing to list this as a Rejected Idea in PEP 622
https://github.com/python/peps/pull/1564
.. [3] In-progress reference implementation for this PEP
https://github.com/ncoghlan/cpython/tree/pep-642-constraint-patterns
.. [4] PEP 634 reference implementation
https://github.com/python/cpython/pull/22917
.. [5] Steven D'Aprano's cogent criticism of the first published iteration of this PEP
https://mail.python.org/archives/list/python-dev@python.org/message/BTHFWG6MWLHALOD6CHTUFPHAR65YN6BP/
.. [6] Thomas Wouter's initial review of the structured pattern matching proposals
https://mail.python.org/archives/list/python-dev@python.org/thread/4SBR3J5IQUYE752KR7C6432HNBSYKC5X/
.. [7] Stack Overflow answer regarding the use cases for ``_`` as an identifier
https://stackoverflow.com/questions/5893163/what-is-the-purpose-of-the-single-underscore-variable-in-python/5893946#5893946
.. [8] Pre-publication draft of "Precise Semantics for Pattern Matching"
https://github.com/markshannon/pattern-matching/blob/master/precise_semantics.rst
.. [9] Kohn et al., Dynamic Pattern Matching with Python
https://gvanrossum.github.io/docs/PyPatternMatching.pdf
.. _Appendix A:
Appendix A -- Full Grammar
==========================
Here is the full modified grammar for ``match_stmt``, replacing Appendix A
in PEP 634.
Notation used beyond standard EBNF is as per PEP 534:
- ``'KWD'`` denotes a hard keyword
- ``"KWD"`` denotes a soft keyword
- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*``
- ``!RULE`` is a negative lookahead assertion
::
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
subject_expr:
| star_named_expression ',' [star_named_expressions]
| named_expression
case_block: "case" (guarded_pattern | open_pattern) ':' block
guarded_pattern: closed_pattern 'if' named_expression
open_pattern: # Pattern may use multiple tokens with no closing delimiter
| as_pattern
| or_pattern
as_pattern: [closed_pattern] pattern_as_clause
as_pattern_with_inferred_wildcard: pattern_as_clause
pattern_as_clause: 'as' pattern_capture_target
pattern_capture_target: !"__" NAME !('.' | '(' | '=')
or_pattern: '|'.simple_pattern+
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
| closed_pattern
| value_constraint
value_constraint:
| eq_constraint
| id_constraint
eq_constraint: '==' closed_expr
id_constraint: 'is' closed_expr
closed_expr: # Require a single token or a closing delimiter in expression
| primary
| closed_factor
closed_factor: # "factor" is the main grammar node for these unary ops
| '+' primary
| '-' primary
| '~' primary
closed_pattern: # Require a single token or a closing delimiter in pattern
| wildcard_pattern
| group_pattern
| structural_constraint
wildcard_pattern: "__"
group_pattern: '(' open_pattern ')'
structural_constraint:
| sequence_constraint
| mapping_constraint
| attrs_constraint
| class_constraint
sequence_constraint: '[' [sequence_constraint_elements] ']'
sequence_constraint_elements: ','.sequence_constraint_element+ ','?
sequence_constraint_element:
| star_pattern
| simple_pattern
| as_pattern_with_inferred_wildcard
star_pattern: '*' (pattern_as_clause | wildcard_pattern)
mapping_constraint: '{' [mapping_constraint_elements] '}'
mapping_constraint_elements: ','.key_value_constraint+ ','?
key_value_constraint:
| closed_expr pattern_as_clause
| closed_expr ':' simple_pattern
| double_star_capture
double_star_capture: '**' pattern_as_clause
attrs_constraint:
| name_or_attr '{' [attrs_constraint_elements] '}'
name_or_attr: attr | NAME
attr: name_or_attr '.' NAME
attrs_constraint_elements: ','.attr_value_constraint+ ','?
attr_value_constraint:
| '.' NAME pattern_as_clause
| '.' NAME value_constraint
| '.' NAME ':' simple_pattern
| '.' NAME
class_constraint:
| name_or_attr '(' ')'
| name_or_attr '(' positional_patterns ','? ')'
| name_or_attr '(' class_constraint_attrs ')'
| name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')'
positional_patterns: ','.positional_pattern+
positional_pattern:
| simple_pattern
| as_pattern_with_inferred_wildcard
class_constraint_attrs:
| '**' '{' [attrs_constraint_elements] '}'
.. _Appendix B:
Appendix B: Summary of Abstract Syntax Tree changes
===================================================
The following new nodes are added to the AST by this PEP::
stmt = ...
| ...
| Match(expr subject, match_case* cases)
| ...
...
match_case = (pattern pattern, expr? guard, stmt* body)
pattern = MatchAlways
| MatchValue(matchop op, expr value)
| MatchSequence(pattern* patterns)
| MatchMapping(expr* keys, pattern* patterns)
| MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
| MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
| MatchRestOfSequence(identifier? target)
-- A NULL entry in the MatchMapping key list handles capturing extra mapping keys
| MatchAs(pattern? pattern, identifier target)
| MatchOr(pattern* patterns)
attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)
matchop = EqCheck | IdCheck
.. _Appendix C:
Appendix C: Summary of changes relative to PEP 634
==================================================
The overall `match`/`case` statement syntax and the guard expression syntax
remain the same as they are in PEP 634.
Relative to PEP 634 this PEP makes the following key changes:
* a new ``pattern`` type is defined in the AST, rather then reusing the ``expr``
type for patterns
* the new ``MatchAs`` and ``MatchOr`` AST nodes are moved from the ``expr``
type to the ``pattern`` type
* the wildcard pattern changes from ``_`` (single underscore) to ``__`` (double
underscore), and gains a dedicated ``MatchAlways`` node in the AST
* due to ambiguity of intent, value patterns and literal patterns are removed
* a new expression category is introduced: "closed expressions"
* closed expressions are either primary expressions, or a closed expression
preceded by one of the high precedence unary operators (``+``, ``-``, ``~``)
* a new pattern type is introduced: "value constraint patterns"
* value constraints have a dedicated ``MatchValue`` AST node rather than
allowing a combination of ``Constant`` (literals), ``UnaryOp``
(negative numbers), ``BinOp`` (complex numbers), and ``Attribute`` (attribute
lookups)
* value constraint patterns are either equality constraints or identity constraints
* equality constraints use ``==`` as a prefix marker on an otherwise
arbitrary closed expression: ``== EXPR``
* identity constraints use ``is`` as a prefix marker on an otherwise
arbitrary closed expression: ``is EXPR``
* due to ambiguity of intent, capture patterns are removed. All capture operations
use the ``as`` keyword (even in sequence matching) and are represented in the
AST as either ``MatchAs`` or ``MatchRestOfSequence`` nodes.
* to reduce verbosity in AS patterns, ``as NAME`` is permitted, with the same
meaning as ``__ as NAME``
* sequence patterns change to *require* the use of square brackets, rather than
offering the same syntactic flexibility as assignment targets (assignment
statements allow iterable unpacking to be indicated by any use of a tuple
separated target, with or without surrounding parentheses or square brackets)
* sequence patterns gain a dedicated ``MatchSequence`` AST node rather than
reusing ``List``
* mapping patterns change to allow arbitrary closed expressions as keys
* mapping patterns gain a dedicated ``MatchMapping`` AST node rather than
reusing ``Dict``
* to reduce verbosity in mapping patterns, ``KEY : __ as NAME`` may be shortened
to ``KEY as NAME``
* class patterns no longer use individual keyword argument syntax for attribute
matching. Instead they use double-star syntax, along with a variant on mapping
pattern syntax with a dot prefix on the attribute names
* class patterns gain a dedicated ``MatchClass`` AST node rather than
reusing ``Call``
* to reduce verbosity, class attribute matching allows ``:`` to be omitted when
the pattern to be matched starts with ``==``, ``is``, or ``as``
* class patterns treat any class that sets ``__match_args__`` to ``None`` as
accepting a single positional pattern that is matched against the entire
object (avoiding the special casing required in PEP 634)
* class patterns raise ``TypeError` when used with an object that does not
define ``__match_args__``
* dedicated syntax for ducktyping is added, such that ``case cls{...}:`` is
roughly equivalent to ``case cls(**{...}):``, but skips the check for the
existence of ``__match_args__``. This pattern also has a dedicated AST node,
``MatchAttrs``
Note that postponing literal patterns also makes it possible to postpone the
question of whether we need an "INUMBER" token in the tokeniser for imaginary
literals. Without it, the parser can't distinguish complex literals from other
binary addition and subtraction operations on constants, so proposals like
PEP 634 have to do work in later compilation steps to check for correct usage.
.. _Appendix D:
Appendix D: History of changes to this proposal
===============================================
The first published iteration of this proposal mostly followed PEP 634, but
suggested using ``?EXPR`` for equality constraints and ``?is EXPR`` for
identity constraints rather than PEP 634's value patterns and literal patterns.
The second published iteration mostly adopted a counter-proposal from Steven
D'Aprano that kept the PEP 634 style inferred constraints in many situations,
but also allowed the use of ``== EXPR`` for explicit equality constraints, and
``is EXPR`` for explicit identity constraints.
The third published (and current) iteration dropped inferred patterns entirely,
in an attempt to resolve the concerns with the fact that the patterns
``case {key: NAME}:`` and ``case cls(attr=NAME):`` would both bind ``NAME``
despite it appearing to the right of another subexpression without using the
``as`` keyword. The revised proposal also eliminates the possibility of writing
``case TARGET1 as TARGET2:``, which would bind to both of the given names. Of
those changes, the most concerning was ``case cls(attr=TARGET_NAME):``, since it
involved the use of ``=`` with the binding target on the right, the exact
opposite of what happens in assignment statements, function calls, and
function signature declarations.
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: