2107 lines
88 KiB
ReStructuredText
2107 lines
88 KiB
ReStructuredText
PEP: 642
|
|
Title: Explicit Pattern Syntax for Structural Pattern Matching
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Nick Coghlan <ncoghlan@gmail.com>
|
|
BDFL-Delegate:
|
|
Discussions-To: python-dev@python.org
|
|
Status: Rejected
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Requires: 634
|
|
Created: 26-Sep-2020
|
|
Python-Version: 3.10
|
|
Post-History: 31-Oct-2020, 08-Nov-2020, 03-Jan-2021
|
|
Resolution: https://mail.python.org/archives/list/python-dev@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3/
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP covers an alternative syntax proposal for :pep:`634`'s structural pattern
|
|
matching that requires explicit prefixes on all capture patterns and value
|
|
constraints. It also proposes a new dedicated syntax for instance attribute
|
|
patterns that aligns more closely with the proposed mapping pattern syntax.
|
|
|
|
While the result is necessarily more verbose than the proposed syntax in
|
|
:pep:`634`, it is still significantly less verbose than the status quo.
|
|
|
|
As an example, the following match statement would extract "host" and "port"
|
|
details from a 2 item sequence, a mapping with "host" and "port" keys, any
|
|
object with "host" and "port" attributes, or a "host:port" string, treating
|
|
the "port" as optional in the latter three cases::
|
|
|
|
port = DEFAULT_PORT
|
|
match expr:
|
|
case [as host, as port]:
|
|
pass
|
|
case {"host" as host, "port" as port}:
|
|
pass
|
|
case {"host" as host}:
|
|
pass
|
|
case object{.host as host, .port as port}:
|
|
pass
|
|
case object{.host as host}:
|
|
pass
|
|
case str{} as addr:
|
|
host, __, optional_port = addr.partition(":")
|
|
if optional_port:
|
|
port = optional_port
|
|
case __ as m:
|
|
raise TypeError(f"Unknown address format: {m!r:.200}")
|
|
port = int(port)
|
|
|
|
|
|
At a high level, this PEP proposes to categorise the different available pattern
|
|
types as follows:
|
|
|
|
* wildcard pattern: ``__``
|
|
* group patterns: ``(PTRN)``
|
|
* value constraint patterns:
|
|
* equality constraints: ``== EXPR``
|
|
* identity constraints: ``is EXPR``
|
|
* structural constraint patterns:
|
|
* sequence constraint patterns: ``[PTRN, as NAME, PTRN as NAME]``
|
|
* mapping constraint patterns: ``{EXPR: PTRN, EXPR as NAME}``
|
|
* instance attribute constraint patterns:
|
|
``CLS{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME}``
|
|
* class defined constraint patterns:
|
|
``CLS(PTRN, PTRN, **{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME})``
|
|
* OR patterns: ``PTRN | PTRN | PTRN``
|
|
* AS patterns: ``PTRN as NAME`` (omitting the pattern implies ``__``)
|
|
|
|
The intent of this approach is to:
|
|
|
|
* allow an initial form of pattern matching to be developed and released without
|
|
needing to decide up front on the best default options for handling bare names,
|
|
attribute lookups, and literal values
|
|
* ensure that pattern matching is defined explicitly at the Abstract Syntax Tree
|
|
level, allowing the specifications of the semantics and the surface syntax for
|
|
pattern matching to be clearly separated
|
|
* define a clear and concise "ducktyping" syntax that could potentially be
|
|
adopted in ordinary expressions as a way to more easily retrieve a tuple
|
|
containing multiple attributes from the same object
|
|
|
|
Relative to :pep:`634`, the proposal also deliberately eliminates any syntax that
|
|
"binds to the right" without using the ``as`` keyword (using capture patterns
|
|
in :pep:`634`'s mapping patterns and class patterns) or binds to both the left and
|
|
the right in the same pattern (using :pep:`634`'s capture patterns with AS patterns)
|
|
|
|
|
|
Relationship with other PEPs
|
|
============================
|
|
|
|
This PEP both depends on and competes with :pep:`634` - the PEP author agrees that
|
|
match statements would be a sufficiently valuable addition to the language to
|
|
be worth the additional complexity that they add to the learning process, but
|
|
disagrees with the idea that "simple name vs literal or attribute lookup"
|
|
really offers an adequate syntactic distinction between name binding and value
|
|
lookup operations in match patterns (at least for Python).
|
|
|
|
This PEP agrees with the spirit of :pep:`640` (that the chosen wildcard pattern to
|
|
skip a name binding should be supported everywhere, not just in match patterns),
|
|
but is now proposing a different spelling for the wildcard syntax (``__`` rather
|
|
than ``?``). As such, it competes with :pep:`640` as written, but would complement
|
|
a proposal to deprecate the use of ``__`` as an ordinary identifier and instead
|
|
turn it into a general purpose wildcard marker that always skips making a new
|
|
local variable binding.
|
|
|
|
While it has not yet been put forward as a PEP, Mark Shannon has a pre-PEP draft
|
|
[8]_ expressing several concerns about the runtime semantics of the pattern
|
|
matching proposal in :pep:`634`. This PEP is somewhat complementary to that one, as
|
|
even though this PEP is mostly about surface syntax changes rather than major
|
|
semantic changes, it does propose that the Abstract Syntax Tree definition be
|
|
made more explicit to better separate the details of the surface syntax from the
|
|
semantics of the code generation step. There is one specific idea in that pre-PEP
|
|
draft that this PEP explicitly rejects: the idea that the different kinds of
|
|
matching are mutually exclusive. It's entirely possible for the same value to
|
|
match different kinds of structural pattern, and which one takes precedence will
|
|
intentionally be governed by the order of the cases in the match statement.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
The original :pep:`622` (which was later split into :pep:`634`, :pep:`635`, and :pep:`636`)
|
|
incorporated an unstated but essential assumption in its syntax design: that
|
|
neither ordinary expressions *nor* the existing assignment target syntax provide
|
|
an adequate foundation for the syntax used in match patterns.
|
|
|
|
While the PEP didn't explicitly state this assumption, one of the PEP authors
|
|
explained it clearly on python-dev [1]_:
|
|
|
|
The actual problem that I see is that we have different cultures/intuitions
|
|
fundamentally clashing here. In particular, so many programmers welcome
|
|
pattern matching as an "extended switch statement" and find it therefore
|
|
strange that names are binding and not expressions for comparison. Others
|
|
argue that it is at odds with current assignment statements, say, and
|
|
question why dotted names are _/not/_ binding. What all groups seem to
|
|
have in common, though, is that they refer to _/their/_ understanding and
|
|
interpretation of the new match statement as 'consistent' or 'intuitive'
|
|
--- naturally pointing out where we as PEP authors went wrong with our
|
|
design.
|
|
|
|
But here is the catch: at least in the Python world, pattern matching as
|
|
proposed by this PEP is an unprecedented and new way of approaching a common
|
|
problem. It is not simply an extension of something already there. Even
|
|
worse: while designing the PEP we found that no matter from which angle you
|
|
approach it, you will run into issues of seeming 'inconsistencies' (which is
|
|
to say that pattern matching cannot be reduced to a 'linear' extension of
|
|
existing features in a meaningful way): there is always something that goes
|
|
fundamentally beyond what is already there in Python. That's why I argue
|
|
that arguments based on what is 'intuitive' or 'consistent' just do not
|
|
make sense _/in this case/_.
|
|
|
|
The first iteration of this PEP was then born out of an attempt to show that the
|
|
second assertion was not accurate, and that match patterns could be treated
|
|
as a variation on assignment targets without leading to inherent contradictions.
|
|
(An earlier PR submitted to list this option in the "Rejected Ideas" section
|
|
of the original :pep:`622` had previously been declined [2]_).
|
|
|
|
However, the review process for this PEP strongly suggested that not only did
|
|
the contradictions that Tobias mentioned in his email exist, but they were also
|
|
concerning enough to cast doubts on the syntax proposal presented in :pep:`634`.
|
|
Accordingly, this PEP was changed to go even further than :pep:`634`, and largely
|
|
abandon alignment between the sequence matching syntax and the existing iterable
|
|
unpacking syntax (effectively answering "Not really, as least as far as the
|
|
exact syntax is concerned" to the first question raised in the DLS'20 paper
|
|
[9]_: "Can we extend a feature like iterable unpacking to work for more general
|
|
object and data layouts?").
|
|
|
|
This resulted in a complete reversal of the goals of the PEP: rather than
|
|
attempting to emphasise the similarities between assignment and pattern matching,
|
|
the PEP now attempts to make sure that assignment target syntax isn't being
|
|
reused *at all*, reducing the likelihood of incorrect inferences being drawn
|
|
about the new construct based on experience with existing ones.
|
|
|
|
Finally, before completing the 3rd iteration of the proposal (which dropped
|
|
inferred patterns entirely), the PEP author spent quite a bit of time reflecting
|
|
on the following entries in :pep:`20`:
|
|
|
|
* Explicit is better than implicit.
|
|
* Special cases aren't special enough to break the rules.
|
|
* In the face of ambiguity, refuse the temptation to guess.
|
|
|
|
If we start with an explicit syntax, we can always add syntactic shortcuts later
|
|
(e.g. consider the recent proposals to add shortcuts for ``Union`` and
|
|
``Optional`` type hints only after years of experience with the original more
|
|
verbose forms), while if we start out with only the abbreviated forms,
|
|
then we don't have any real way to revisit those decisions in a future release.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
This PEP retains the overall ``match``/``case`` statement structure and semantics
|
|
from :pep:`634`, but proposes multiple changes that mean that user intent is
|
|
explicitly specified in the concrete syntax rather than needing to be inferred
|
|
from the pattern matching context.
|
|
|
|
In the proposed Abstract Syntax Tree, the semantics are also always explicit,
|
|
with no inference required.
|
|
|
|
|
|
The Match Statement
|
|
-------------------
|
|
|
|
Surface syntax::
|
|
|
|
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
|
|
subject_expr:
|
|
| star_named_expression ',' star_named_expressions?
|
|
| named_expression
|
|
case_block: "case" (guarded_pattern | open_pattern) ':' block
|
|
|
|
guarded_pattern: closed_pattern 'if' named_expression
|
|
|
|
open_pattern:
|
|
| as_pattern
|
|
| or_pattern
|
|
|
|
closed_pattern:
|
|
| wildcard_pattern
|
|
| group_pattern
|
|
| structural_constraint
|
|
|
|
Abstract syntax::
|
|
|
|
Match(expr subject, match_case* cases)
|
|
match_case = (pattern pattern, expr? guard, stmt* body)
|
|
|
|
|
|
The rules ``star_named_expression``, ``star_named_expressions``,
|
|
``named_expression`` and ``block`` are part of the `standard Python
|
|
grammar <https://docs.python.org/3.10/reference/grammar.html>`_.
|
|
|
|
Open patterns are patterns which consist of multiple tokens, and aren't
|
|
necessarily terminated by a closing delimiter (for example, ``__ as x``,
|
|
``int() | bool()``). To avoid ambiguity for human readers, their usage is
|
|
restricted to top level patterns and to group patterns (which are patterns
|
|
surrounded by parentheses).
|
|
|
|
Closed patterns are patterns which either consist of a single token
|
|
(i.e. ``__``), or else have a closing delimiter as a required part of their
|
|
syntax (e.g. ``[as x, as y]``, ``object{.x as x, .y as y}``).
|
|
|
|
As in :pep:`634`, the ``match`` and ``case`` keywords are soft keywords, i.e. they
|
|
are not reserved words in other grammatical contexts (including at the
|
|
start of a line if there is no colon where expected). This means
|
|
that they are recognized as keywords when part of a match
|
|
statement or case block only, and are allowed to be used in all
|
|
other contexts as variable or argument names.
|
|
|
|
Unlike :pep:`634`, patterns are explicitly defined as a new kind of node in the
|
|
abstract syntax tree - even when surface syntax is shared with existing
|
|
expression nodes, a distinct abstract node is emitted by the parser.
|
|
|
|
For context, ``match_stmt`` is a new alternative for
|
|
``compound_statement`` in the surface syntax and ``Match`` is a new
|
|
alternative for ``stmt`` in the abstract syntax.
|
|
|
|
|
|
Match Semantics
|
|
^^^^^^^^^^^^^^^
|
|
|
|
This PEP largely retains the overall pattern matching semantics proposed in
|
|
:pep:`634`.
|
|
|
|
The proposed syntax for patterns changes significantly, and is discussed in
|
|
detail below.
|
|
|
|
There are also some proposed changes to the semantics of class defined
|
|
constraints (class patterns in :pep:`634`) to eliminate the need to special case
|
|
any builtin types (instead, the introduction of dedicated syntax for instance
|
|
attribute constraints allows the behaviour needed by those builtin types to be
|
|
specified as applying to any type that sets ``__match_args__`` to ``None``)
|
|
|
|
|
|
.. _guards:
|
|
|
|
Guards
|
|
^^^^^^
|
|
|
|
This PEP retains the guard clause semantics proposed in :pep:`634`.
|
|
|
|
However, the syntax is changed slightly to require that when a guard clause
|
|
is present, the case pattern must be a *closed* pattern.
|
|
|
|
This makes it clearer to the reader where the pattern ends and the guard clause
|
|
begins. (This is mainly a potential problem with OR patterns, where the guard
|
|
clause looks kind of like the start of a conditional expression in the final
|
|
pattern. Actually doing that isn't legal syntax, so there's no ambiguity as far
|
|
as the compiler is concerned, but the distinction may not be as clear to a human
|
|
reader)
|
|
|
|
|
|
Irrefutable case blocks
|
|
^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The definition of irrefutable case blocks changes slightly in this PEP relative
|
|
to :pep:`634`, as capture patterns no longer exist as a separate concept from
|
|
AS patterns.
|
|
|
|
Aside from that caveat, the handling of irrefutable cases is the same as in
|
|
:pep:`634`:
|
|
|
|
* wildcard patterns are irrefutable
|
|
* AS patterns whose left-hand side is irrefutable
|
|
* OR patterns containing at least one irrefutable pattern
|
|
* parenthesized irrefutable patterns
|
|
* a case block is considered irrefutable if it has no guard and its
|
|
pattern is irrefutable.
|
|
* a match statement may have at most one irrefutable case block, and it
|
|
must be last.
|
|
|
|
|
|
.. _patterns:
|
|
|
|
Patterns
|
|
--------
|
|
|
|
The top-level surface syntax for patterns is as follows::
|
|
|
|
open_pattern: # Pattern may use multiple tokens with no closing delimiter
|
|
| as_pattern
|
|
| or_pattern
|
|
|
|
as_pattern: [closed_pattern] pattern_as_clause
|
|
|
|
or_pattern: '|'.simple_pattern+
|
|
|
|
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
|
|
| closed_pattern
|
|
| value_constraint
|
|
|
|
closed_pattern: # Require a single token or a closing delimiter in pattern
|
|
| wildcard_pattern
|
|
| group_pattern
|
|
| structural_constraint
|
|
|
|
As described above, the usage of open patterns is limited to top level case
|
|
clauses and when parenthesised in a group pattern.
|
|
|
|
The abstract syntax for patterns explicitly indicates which elements are
|
|
subpatterns and which elements are subexpressions or identifiers::
|
|
|
|
pattern = MatchAlways
|
|
| MatchValue(matchop op, expr value)
|
|
| MatchSequence(pattern* patterns)
|
|
| MatchMapping(expr* keys, pattern* patterns)
|
|
| MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
|
|
| MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
|
|
|
|
| MatchRestOfSequence(identifier? target)
|
|
-- A NULL entry in the MatchMapping key list handles capturing extra mapping keys
|
|
|
|
| MatchAs(pattern? pattern, identifier target)
|
|
| MatchOr(pattern* patterns)
|
|
|
|
|
|
AS Patterns
|
|
^^^^^^^^^^^
|
|
|
|
Surface syntax::
|
|
|
|
as_pattern: [closed_pattern] pattern_as_clause
|
|
pattern_as_clause: 'as' pattern_capture_target
|
|
pattern_capture_target: !"__" NAME !('.' | '(' | '=')
|
|
|
|
(Note: the name on the right may not be ``__``.)
|
|
|
|
Abstract syntax::
|
|
|
|
MatchAs(pattern? pattern, identifier target)
|
|
|
|
An AS pattern matches the closed pattern on the left of the ``as``
|
|
keyword against the subject. If this fails, the AS pattern fails.
|
|
Otherwise, the AS pattern binds the subject to the name on the right
|
|
of the ``as`` keyword and succeeds.
|
|
|
|
If no pattern to match is given, the wildcard pattern (``__``) is implied.
|
|
|
|
To avoid confusion with the `wildcard pattern`_, the double underscore (``__``)
|
|
is not permitted as a capture target (this is what ``!"__"`` expresses).
|
|
|
|
A capture pattern always succeeds. It binds the subject value to the
|
|
name using the scoping rules for name binding established for named expressions
|
|
in :pep:`572`. (Summary: the name becomes a local
|
|
variable in the closest containing function scope unless there's an
|
|
applicable ``nonlocal`` or ``global`` statement.)
|
|
|
|
In a given pattern, a given name may be bound only once. This
|
|
disallows for example ``case [as x, as x]: ...`` but allows
|
|
``case [as x] | (as x)``:
|
|
|
|
As an open pattern, the usage of AS patterns is limited to top level case
|
|
clauses and when parenthesised in a group pattern. However, several of the
|
|
structural constraints allow the use of ``pattern_as_clause`` in relevant
|
|
locations to bind extracted elements of the matched subject to local variables.
|
|
These are mostly represented in the abstract syntax tree as ``MatchAs`` nodes,
|
|
aside from the dedicated ``MatchRestOfSequence`` node in sequence patterns.
|
|
|
|
|
|
OR Patterns
|
|
^^^^^^^^^^^
|
|
|
|
Surface syntax::
|
|
|
|
or_pattern: '|'.simple_pattern+
|
|
|
|
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
|
|
| closed_pattern
|
|
| value_constraint
|
|
|
|
Abstract syntax::
|
|
|
|
MatchOr(pattern* patterns)
|
|
|
|
When two or more patterns are separated by vertical bars (``|``),
|
|
this is called an OR pattern. (A single simple pattern is just that)
|
|
|
|
Only the final subpattern may be irrefutable.
|
|
|
|
Each subpattern must bind the same set of names.
|
|
|
|
An OR pattern matches each of its subpatterns in turn to the subject,
|
|
until one succeeds. The OR pattern is then deemed to succeed.
|
|
If none of the subpatterns succeed the OR pattern fails.
|
|
|
|
Subpatterns are mostly required to be closed patterns, but the parentheses may
|
|
be omitted for value constraints.
|
|
|
|
|
|
.. _value_constraints:
|
|
|
|
Value constraints
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
Surface syntax::
|
|
|
|
value_constraint:
|
|
| eq_constraint
|
|
| id_constraint
|
|
|
|
eq_constraint: '==' closed_expr
|
|
id_constraint: 'is' closed_expr
|
|
|
|
closed_expr: # Require a single token or a closing delimiter in expression
|
|
| primary
|
|
| closed_factor
|
|
|
|
closed_factor: # "factor" is the main grammar node for these unary ops
|
|
| '+' primary
|
|
| '-' primary
|
|
| '~' primary
|
|
|
|
Abstract syntax::
|
|
|
|
MatchValue(matchop op, expr value)
|
|
matchop = EqCheck | IdCheck
|
|
|
|
|
|
The rule ``primary`` is defined in the standard Python grammar, and only
|
|
allows expressions that either consist of a single token, or else are required
|
|
to end with a closing delimiter.
|
|
|
|
Value constraints replace :pep:`634`'s literal patterns and value patterns.
|
|
|
|
Equality constraints are written as ``== EXPR``, while identity constraints are
|
|
written as ``is EXPR``.
|
|
|
|
An equality constraint succeeds if the subject value compares equal to the
|
|
value given on the right, while an identity constraint succeeds only if they are
|
|
the exact same object.
|
|
|
|
The expressions to be compared against are largely restricted to either
|
|
single tokens (e.g. names, strings, numbers, builtin constants), or else to
|
|
expressions that are required to end with a closing delimiter.
|
|
|
|
The use of the high precedence unary operators is also permitted, as the risk of
|
|
perceived ambiguity is low, and being able to specify negative numbers without
|
|
parentheses is desirable.
|
|
|
|
When the same constraint expression occurs multiple times in the same match
|
|
statement, the interpreter may cache the first value calculated and reuse it,
|
|
rather than repeat the expression evaluation. (As for :pep:`634` value patterns,
|
|
this cache is strictly tied to a given execution of a given match statement.)
|
|
|
|
Unlike literal patterns in :pep:`634`, this PEP requires that complex
|
|
literals be parenthesised to be accepted by the parser. See the Deferred
|
|
Ideas section for discussion on that point.
|
|
|
|
If this PEP were to be adopted in preference to :pep:`634`, then all literal and
|
|
value patterns would instead be written more explicitly as value constraints::
|
|
|
|
# Literal patterns
|
|
match number:
|
|
case == 0:
|
|
print("Nothing")
|
|
case == 1:
|
|
print("Just one")
|
|
case == 2:
|
|
print("A couple")
|
|
case == -1:
|
|
print("One less than nothing")
|
|
case == (1-1j):
|
|
print("Good luck with that...")
|
|
|
|
# Additional literal patterns
|
|
match value:
|
|
case == True:
|
|
print("True or 1")
|
|
case == False:
|
|
print("False or 0")
|
|
case == None:
|
|
print("None")
|
|
case == "Hello":
|
|
print("Text 'Hello'")
|
|
case == b"World!":
|
|
print("Binary 'World!'")
|
|
|
|
# Matching by identity rather than equality
|
|
SENTINEL = object()
|
|
match value:
|
|
case is True:
|
|
print("True, not 1")
|
|
case is False:
|
|
print("False, not 0")
|
|
case is None:
|
|
print("None, following PEP 8 comparison guidelines")
|
|
case is ...:
|
|
print("May be useful when writing __getitem__ methods?")
|
|
case is SENTINEL:
|
|
print("Matches the sentinel by identity, not just value")
|
|
|
|
# Matching against variables and attributes
|
|
from enum import Enum
|
|
class Sides(str, Enum):
|
|
SPAM = "Spam"
|
|
EGGS = "eggs"
|
|
...
|
|
|
|
preferred_side = Sides.EGGS
|
|
match entree[-1]:
|
|
case == Sides.SPAM: # Compares entree[-1] == Sides.SPAM.
|
|
response = "Have you got anything without Spam?"
|
|
case == preferred_side: # Compares entree[-1] == preferred_side
|
|
response = f"Oh, I love {preferred_side}!"
|
|
case as side: # Assigns side = entree[-1].
|
|
response = f"Well, could I have their Spam instead of the {side} then?"
|
|
|
|
Note the ``== preferred_side`` example: using an explicit prefix marker on
|
|
constraint expressions removes the restriction to only working with attributes
|
|
or literals for value lookups.
|
|
|
|
The ``== (1-1j)`` example illustrates the use of parentheses to turn any
|
|
subexpression into a closed one.
|
|
|
|
|
|
.. _wildcard_pattern:
|
|
|
|
Wildcard Pattern
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
Surface syntax::
|
|
|
|
wildcard_pattern: "__"
|
|
|
|
Abstract syntax::
|
|
|
|
MatchAlways
|
|
|
|
A wildcard pattern always succeeds. As in :pep:`634`, it binds no name.
|
|
|
|
Where :pep:`634` chooses the single underscore as its wildcard pattern for
|
|
consistency with other languages, this PEP chooses the double underscore as that
|
|
has a clearer path towards potentially being made consistent across the entire
|
|
language, whereas that path is blocked for ``"_"`` by i18n related use cases.
|
|
|
|
Example usage::
|
|
|
|
match sequence:
|
|
case [__]: # any sequence with a single element
|
|
return True
|
|
case [start, *__, end]: # a sequence with at least two elements
|
|
return start == end
|
|
case __: # anything
|
|
return False
|
|
|
|
|
|
|
|
Group Patterns
|
|
^^^^^^^^^^^^^^
|
|
|
|
Surface syntax::
|
|
|
|
group_pattern: '(' open_pattern ')'
|
|
|
|
For the syntax of ``open_pattern``, see Patterns above.
|
|
|
|
A parenthesized pattern has no additional syntax and is not represented in the
|
|
abstract syntax tree. It allows users to add parentheses around patterns to
|
|
emphasize the intended grouping, and to allow nesting of open patterns when the
|
|
grammar requires a closed pattern.
|
|
|
|
Unlike :pep:`634`, there is no potential ambiguity with sequence patterns, as
|
|
this PEP requires that all sequence patterns be written with square brackets.
|
|
|
|
|
|
Structural constraints
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Surface syntax::
|
|
|
|
structural_constraint:
|
|
| sequence_constraint
|
|
| mapping_constraint
|
|
| attrs_constraint
|
|
| class_constraint
|
|
|
|
Note: the separate "structural constraint" subcategory isn't used in the
|
|
abstract syntax tree, it's merely used as a convenient grouping node in the
|
|
surface syntax definition.
|
|
|
|
Structural constraints are patterns used to both make assertions about complex
|
|
objects and to extract values from them.
|
|
|
|
These patterns may all bind multiple values, either through the use of nested
|
|
AS patterns, or else through the use of ``pattern_as_clause`` elements included
|
|
in the definition of the pattern.
|
|
|
|
|
|
Sequence constraints
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Surface syntax::
|
|
|
|
sequence_constraint: '[' [sequence_constraint_elements] ']'
|
|
sequence_constraint_elements: ','.sequence_constraint_element+ ','?
|
|
sequence_constraint_element:
|
|
| star_pattern
|
|
| simple_pattern
|
|
| pattern_as_clause
|
|
star_pattern: '*' (pattern_as_clause | wildcard_pattern)
|
|
|
|
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
|
|
| closed_pattern
|
|
| value_constraint
|
|
|
|
pattern_as_clause: 'as' pattern_capture_target
|
|
|
|
Abstract syntax::
|
|
|
|
MatchSequence(pattern* patterns)
|
|
|
|
MatchRestOfSequence(identifier? target)
|
|
|
|
Sequence constraints allow items within a sequence to be checked and
|
|
optionally extracted.
|
|
|
|
A sequence pattern fails if the subject value is not an instance of
|
|
``collections.abc.Sequence``. It also fails if the subject value is
|
|
an instance of ``str``, ``bytes`` or ``bytearray`` (see Deferred Ideas for
|
|
a discussion on potentially removing the need for this special casing).
|
|
|
|
A sequence pattern may contain at most one star subpattern. The star
|
|
subpattern may occur in any position and is represented in the AST using the
|
|
``MatchRestOfSequence`` node.
|
|
|
|
If no star subpattern is present, the sequence pattern is a fixed-length
|
|
sequence pattern; otherwise it is a variable-length sequence pattern.
|
|
|
|
A fixed-length sequence pattern fails if the length of the subject
|
|
sequence is not equal to the number of subpatterns.
|
|
|
|
A variable-length sequence pattern fails if the length of the subject
|
|
sequence is less than the number of non-star subpatterns.
|
|
|
|
The length of the subject sequence is obtained using the builtin
|
|
``len()`` function (i.e., via the ``__len__`` protocol). However, the
|
|
interpreter may cache this value in a similar manner as described for
|
|
value constraint expressions.
|
|
|
|
A fixed-length sequence pattern matches the subpatterns to
|
|
corresponding items of the subject sequence, from left to right.
|
|
Matching stops (with a failure) as soon as a subpattern fails. If all
|
|
subpatterns succeed in matching their corresponding item, the sequence
|
|
pattern succeeds.
|
|
|
|
A variable-length sequence pattern first matches the leading non-star
|
|
subpatterns to the corresponding items of the subject sequence, as for
|
|
a fixed-length sequence. If this succeeds, the star subpattern
|
|
matches a list formed of the remaining subject items, with items
|
|
removed from the end corresponding to the non-star subpatterns
|
|
following the star subpattern. The remaining non-star subpatterns are
|
|
then matched to the corresponding subject items, as for a fixed-length
|
|
sequence.
|
|
|
|
Subpatterns are mostly required to be closed patterns, but the parentheses may
|
|
be omitted for value constraints. Sequence elements may also be captured
|
|
unconditionally without parentheses.
|
|
|
|
Note: where :pep:`634` allows all the same syntactic flexibility as iterable
|
|
unpacking in assignment statements, this PEP restricts sequence patterns
|
|
specifically to the square bracket form. Given that the open and parenthesised
|
|
forms are far more popular than square brackets for iterable unpacking, this
|
|
helps emphasise that iterable unpacking and sequence matching are *not* the
|
|
same operation. It also avoids the parenthesised form's ambiguity problem
|
|
between single element sequence patterns and group patterns.
|
|
|
|
|
|
Mapping constraints
|
|
^^^^^^^^^^^^^^^^^^^
|
|
|
|
Surface syntax::
|
|
|
|
mapping_constraint: '{' [mapping_constraint_elements] '}'
|
|
mapping_constraint_elements: ','.key_value_constraint+ ','?
|
|
key_value_constraint:
|
|
| closed_expr pattern_as_clause
|
|
| closed_expr ':' simple_pattern
|
|
| double_star_capture
|
|
double_star_capture: '**' pattern_as_clause
|
|
|
|
(Note that ``**__`` is deliberately disallowed by this syntax, as additional
|
|
mapping entries are ignored by default)
|
|
|
|
closed_expr is defined above, under value constraints.
|
|
|
|
Abstract syntax::
|
|
|
|
MatchMapping(expr* keys, pattern* patterns)
|
|
|
|
Mapping constraints allow keys and values within a sequence to be checked and
|
|
values to optionally be extracted.
|
|
|
|
A mapping pattern fails if the subject value is not an instance of
|
|
``collections.abc.Mapping``.
|
|
|
|
A mapping pattern succeeds if every key given in the mapping pattern
|
|
is present in the subject mapping, and the pattern for
|
|
each key matches the corresponding item of the subject mapping.
|
|
|
|
The presence of keys is checked using the two argument form of the ``get``
|
|
method and a unique sentinel value, which offers the following benefits:
|
|
|
|
* no exceptions need to be created in the lookup process
|
|
* mappings that implement ``__missing__`` (such as ``collections.defaultdict``)
|
|
only match on keys that they already contain, they don't implicitly add keys
|
|
|
|
A mapping pattern may not contain duplicate key values. If duplicate keys are
|
|
detected when checking the mapping pattern, the pattern is considered invalid,
|
|
and a ``ValueError`` is raised. While it would theoretically be possible to
|
|
checked for duplicated constant keys at compile time, no such check is currently
|
|
defined or implemented.
|
|
|
|
(Note: This semantic description is derived from the :pep:`634` reference
|
|
implementation, which differs from the :pep:`634` specification text at time of
|
|
writing. The implementation seems reasonable, so amending the PEP text seems
|
|
like the best way to resolve the discrepancy)
|
|
|
|
If a ``'**' as NAME`` double star pattern is present, that name is bound to a
|
|
``dict`` containing any remaining key-value pairs from the subject mapping
|
|
(the dict will be empty if there are no additional key-value pairs).
|
|
|
|
A mapping pattern may contain at most one double star pattern,
|
|
and it must be last.
|
|
|
|
Value subpatterns are mostly required to be closed patterns, but the parentheses
|
|
may be omitted for value constraints (the ``:`` key/value separator is still
|
|
required to ensure the entry doesn't look like an ordinary comparison operation).
|
|
|
|
Mapping values may also be captured unconditionally using the ``KEY as NAME``
|
|
form, without either parentheses or the ``:`` key/value separator.
|
|
|
|
|
|
Instance attribute constraints
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Surface syntax::
|
|
|
|
attrs_constraint:
|
|
| name_or_attr '{' [attrs_constraint_elements] '}'
|
|
attrs_constraint_elements: ','.attr_value_pattern+ ','?
|
|
attr_value_pattern:
|
|
| '.' NAME pattern_as_clause
|
|
| '.' NAME value_constraint
|
|
| '.' NAME ':' simple_pattern
|
|
| '.' NAME
|
|
|
|
Abstract syntax::
|
|
|
|
MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
|
|
|
|
Instance attribute constraints allow an instance's type to be checked and
|
|
attributes to optionally be extracted.
|
|
|
|
An instance attribute constraint may not repeat the same attribute name multiple
|
|
times. Attempting to do so will result in a syntax error.
|
|
|
|
An instance attribute pattern fails if the subject is not an instance of
|
|
``name_or_attr``. This is tested using ``isinstance()``.
|
|
|
|
If ``name_or_attr`` is not an instance of the builtin ``type``,
|
|
``TypeError`` is raised.
|
|
|
|
If no attribute subpatterns are present, the constraint succeeds if the
|
|
``isinstance()`` check succeeds. Otherwise:
|
|
|
|
- Each given attribute name is looked up as an attribute on the subject.
|
|
|
|
- If this raises an exception other than ``AttributeError``,
|
|
the exception bubbles up.
|
|
|
|
- If this raises ``AttributeError`` the constraint fails.
|
|
|
|
- Otherwise, the subpattern associated with the keyword is matched
|
|
against the attribute value. If no subpattern is specified, the wildcard
|
|
pattern is assumed. If this fails, the constraint fails.
|
|
If it succeeds, the match proceeds to the next attribute.
|
|
|
|
- If all attribute subpatterns succeed, the constraint as a whole succeeds.
|
|
|
|
Instance attribute constraints allow ducktyping checks to be implemented by
|
|
using ``object`` as the required instance type (e.g.
|
|
``case object{.host as host, .port as port}:``).
|
|
|
|
The syntax being proposed here could potentially also be used as the basis for
|
|
a new syntax for retrieving multiple attributes from an object instance in one
|
|
assignment statement (e.g. ``host, port = addr{.host, .port}``). See the
|
|
Deferred Ideas section for further discussion of this point.
|
|
|
|
|
|
Class defined constraints
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Surface syntax::
|
|
|
|
class_constraint:
|
|
| name_or_attr '(' ')'
|
|
| name_or_attr '(' positional_patterns ','? ')'
|
|
| name_or_attr '(' class_constraint_attrs ')'
|
|
| name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')'
|
|
positional_patterns: ','.positional_pattern+
|
|
positional_pattern:
|
|
| simple_pattern
|
|
| pattern_as_clause
|
|
class_constraint_attrs:
|
|
| '**' '{' [attrs_constraint_elements] '}'
|
|
|
|
Abstract syntax::
|
|
|
|
MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
|
|
|
|
Class defined constraints allow a sequence of common attributes to be
|
|
specified on a class and checked positionally, rather than needing to specify
|
|
the attribute names in every related match pattern.
|
|
|
|
As for instance attribute patterns:
|
|
|
|
- a class defined pattern fails if the subject is not an instance of
|
|
``name_or_attr``. This is tested using ``isinstance()``.
|
|
- if ``name_or_attr`` is not an instance of the builtin ``type``,
|
|
``TypeError`` is raised.
|
|
|
|
Regardless of whether or not any arguments are present, the subject is checked
|
|
for a ``__match_args__`` attribute using the equivalent of
|
|
``getattr(cls, "__match_args__", _SENTINEL))``.
|
|
|
|
If this raises an exception the exception bubbles up.
|
|
|
|
If the returned value is not a list, tuple, or ``None``, the conversion fails
|
|
and ``TypeError`` is raised at runtime.
|
|
|
|
This means that only types that actually define ``__match_args__`` will be
|
|
usable in class defined patterns. Types that don't define ``__match_args__``
|
|
will still be usable in instance attribute patterns.
|
|
|
|
If ``__match_args__`` is ``None``, then only a single positional subpattern is
|
|
permitted. Attempting to specify additional attribute patterns either
|
|
positionally or using the double star syntax will cause ``TypeError`` to be
|
|
raised at runtime.
|
|
|
|
This positional subpattern is then matched against the entire subject, allowing
|
|
a type check to be combined with another match pattern (e.g. checking both
|
|
the type and contents of a container, or the type and value of a number).
|
|
|
|
If ``__match_args__`` is a list or tuple, then the class defined constraint is
|
|
converted to an instance attributes constraint as follows:
|
|
|
|
- if only the double star attribute constraints subpattern is present, matching
|
|
proceeds as if for the equivalent instance attributes constraint.
|
|
- if there are more positional subpatterns than the length of
|
|
``__match_args__`` (as obtained using ``len()``), ``TypeError`` is raised.
|
|
- Otherwise, positional pattern ``i`` is converted to an attribute pattern
|
|
using ``__match_args__[i]`` as the attribute name.
|
|
- if any element in ``__match_args__`` is not a string, ``TypeError`` is raised.
|
|
- once the positional patterns have been converted to attribute patterns, then
|
|
they are combined with any attribute constraints given in the double star
|
|
attribute constraints subpattern, and matching proceeds as if for the
|
|
equivalent instance attributes constraint.
|
|
|
|
Note: the ``__match_args__ is None`` handling in this PEP replaces the special
|
|
casing of ``bool``, ``bytearray``, ``bytes``, ``dict``, ``float``,
|
|
``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple`` in :pep:`634`.
|
|
However, the optimised fast path for those types is retained in the
|
|
implementation.
|
|
|
|
|
|
Design Discussion
|
|
=================
|
|
|
|
Requiring explicit qualification of simple names in match patterns
|
|
------------------------------------------------------------------
|
|
|
|
The first iteration of this PEP accepted the basic premise of :pep:`634` that
|
|
iterable unpacking syntax would provide a good foundation for defining a new
|
|
syntax for pattern matching.
|
|
|
|
During the review process, however, two major and one minor ambiguity problems
|
|
were highlighted that arise directly from that core assumption:
|
|
|
|
* most problematically, when binding simple names by default is extended to
|
|
:pep:`634`'s proposed class pattern syntax, the ``ATTR=TARGET_NAME`` construct
|
|
binds to the right without using the ``as`` keyword, and uses the normal
|
|
assignment-to-the-left sigil (``=``) to do it!
|
|
* when binding simple names by default is extended to :pep:`634`'s proposed mapping
|
|
pattern syntax, the ``KEY: TARGET_NAME`` construct binds to the right without
|
|
using the ``as`` keyword
|
|
* using a :pep:`634` capture pattern together with an AS pattern
|
|
(``TARGET_NAME_1 as TARGET_NAME_2``) gives an odd "binds to both the left and
|
|
right" behaviour
|
|
|
|
The third revision of this PEP accounted for this problem by abandoning the
|
|
alignment with iterable unpacking syntax, and instead requiring that all uses
|
|
of bare simple names for anything other than a variable lookup be qualified by
|
|
a preceding sigil or keyword:
|
|
|
|
* ``as NAME``: local variable binding
|
|
* ``.NAME``: attribute lookup
|
|
* ``== NAME``: variable lookup
|
|
* ``is NAME``: variable lookup
|
|
* any other usage: variable lookup
|
|
|
|
The key benefit of this approach is that it makes interpretation of simple names
|
|
in patterns a local activity: a leading ``as`` indicates a name binding, a
|
|
leading ``.`` indicates an attribute lookup, and anything else is a variable
|
|
lookup (regardless of whether we're reading a subpattern or a subexpression).
|
|
|
|
With the syntax now proposed in this PEP, the problematic cases identified above
|
|
no longer read poorly:
|
|
|
|
* ``.ATTR as TARGET_NAME`` is more obviously a binding than ``ATTR=TARGET_NAME``
|
|
* ``KEY as TARGET_NAME`` is more obviously a binding than ``KEY: TARGET_NAME``
|
|
* ``(as TARGET_NAME_1) as TARGET_NAME_2`` is more obviously two bindings than
|
|
``TARGET_NAME_1 as TARGET_NAME_2``
|
|
|
|
|
|
Resisting the temptation to guess
|
|
---------------------------------
|
|
|
|
:pep:`635` looks at the way pattern matching is used in other languages, and
|
|
attempts to use that information to make plausible predictions about the way
|
|
pattern matching will be used in Python:
|
|
|
|
* wanting to extract values to local names will *probably* be more common than
|
|
wanting to match against values stored in local names
|
|
* wanting comparison by equality will *probably* be more common than wanting
|
|
comparison by identity
|
|
* users will *probably* be able to at least remember that bare names bind values
|
|
and attribute references look up values, even if they can't figure that out
|
|
for themselves without reading the documentation or having someone tell them
|
|
|
|
To be clear, I think these predictions actually *are* plausible. However, I also
|
|
don't think we need to guess about this up front: I think we can start out with
|
|
a more explicit syntax that requires users to state their intent using a prefix
|
|
marker (either ``as``, ``==``, or ``is``), and then reassess the situation in a
|
|
few years based on how pattern matching is actually being used *in Python*.
|
|
|
|
At that point, we'll be able to choose amongst at least the following options:
|
|
|
|
* deciding the explicit syntax is concise enough, and not changing anything
|
|
* adding inferred identity constraints for one or more of ``None``, ``...``,
|
|
``True`` and ``False``
|
|
* adding inferred equality constraints for other literals (potentially including
|
|
complex literals)
|
|
* adding inferred equality constraints for attribute lookups
|
|
* adding either inferred equality constraints or inferred capture patterns for
|
|
bare names
|
|
|
|
All of those ideas could be considered independently on their own merits, rather
|
|
than being a potential barrier to introducing pattern matching in the first
|
|
place.
|
|
|
|
If any of these syntactic shortcuts were to eventually be introduced, they'd
|
|
also be straightforward to explain in terms of the underlying more explicit
|
|
syntax (the leading ``as``, ``==``, or ``is`` would just be getting inferred
|
|
by the parser, without the user needing to provide it explicitly). At the
|
|
implementation level, only the parser should need to be change, as the existing
|
|
AST nodes could be reused.
|
|
|
|
|
|
Interaction with caching of attribute lookups in local variables
|
|
----------------------------------------------------------------
|
|
|
|
One of the major changes between this PEP and :pep:`634` is to use ``== EXPR``
|
|
for equality constraint lookups, rather than only offering ``NAME.ATTR``. The
|
|
original motivation for this was to avoid the semantic conflict with regular
|
|
assignment targets, where ``NAME.ATTR`` is already used in assignment statements
|
|
to set attributes, so if ``NAME.ATTR`` were the *only* syntax for symbolic value
|
|
matching, then we're pre-emptively ruling out any future attempts to allow
|
|
matching against single patterns using the existing assignment statement syntax.
|
|
The current motivation is more about the general desire to avoid guessing about
|
|
user's intent, and instead requiring them to state it explicitly in the syntax.
|
|
|
|
However, even within match statements themselves, the ``name.attr`` syntax for
|
|
value patterns has an undesirable interaction with local variable assignment,
|
|
where routine refactorings that would be semantically neutral for any other
|
|
Python statement introduce a major semantic change when applied to a :pep:`634`
|
|
style match statement.
|
|
|
|
Consider the following code::
|
|
|
|
while value < self.limit:
|
|
... # Some code that adjusts "value"
|
|
|
|
The attribute lookup can be safely lifted out of the loop and only performed
|
|
once::
|
|
|
|
_limit = self.limit:
|
|
while value < _limit:
|
|
... # Some code that adjusts "value"
|
|
|
|
With the marker prefix based syntax proposal in this PEP, value constraints
|
|
would be similarly tolerant of match patterns being refactored to use a local
|
|
variable instead of an attribute lookup, with the following two statements
|
|
being functionally equivalent::
|
|
|
|
match expr:
|
|
case {"key": == self.target}:
|
|
... # Handle the case where 'expr["key"] == self.target'
|
|
case __:
|
|
... # Handle the non-matching case
|
|
|
|
_target = self.target
|
|
match expr:
|
|
case {"key": == _target}:
|
|
... # Handle the case where 'expr["key"] == self.target'
|
|
case __:
|
|
... # Handle the non-matching case
|
|
|
|
By contrast, when using :pep:`634`'s value and capture pattern syntaxes that omit
|
|
the marker prefix, the following two statements wouldn't be equivalent at all::
|
|
|
|
# PEP 634's value pattern syntax
|
|
match expr:
|
|
case {"key": self.target}:
|
|
... # Handle the case where 'expr["key"] == self.target'
|
|
case _:
|
|
... # Handle the non-matching case
|
|
|
|
# PEP 634's capture pattern syntax
|
|
_target = self.target
|
|
match expr:
|
|
case {"key": _target}:
|
|
... # Matches any mapping with "key", binding its value to _target
|
|
case _:
|
|
... # Handle the non-matching case
|
|
|
|
This PEP ensures the original semantics are retained under this style of
|
|
simplistic refactoring: use ``== name`` to force interpretation of the result
|
|
as a value constraint, use ``as name`` for a name binding.
|
|
|
|
:pep:`634`'s proposal to offer only the shorthand syntax, with no explicitly
|
|
prefixed form, means that the primary answer on offer is "Well, don't do that,
|
|
then, only compare against attributes in namespaces, don't compare against
|
|
simple names".
|
|
|
|
:pep:`622`'s walrus pattern syntax had another odd interaction where it might not
|
|
bind the same object as the exact same walrus expression in the body of the
|
|
case clause, but :pep:`634` fixed that discrepancy by replacing walrus patterns
|
|
with AS patterns (where the fact that the value bound to the name on the RHS
|
|
might not be the same value as returned by the LHS is a standard feature common
|
|
to all uses of the "as" keyword).
|
|
|
|
|
|
Using existing comparison operators as the value constraint prefix
|
|
--------------------------------------------------------------------
|
|
|
|
If the benefit of a dedicated value constraint prefix is accepted, then the
|
|
next question is to ask exactly what that prefix should be.
|
|
|
|
The initially published version of this PEP proposed using the previously
|
|
unused ``?`` symbol as the prefix for equality constraints, and ``?is`` as the
|
|
prefix for identity constraints. When reviewing the PEP, Steven D'Aprano
|
|
presented a compelling counterproposal [5]_ to use the existing comparison
|
|
operators (``==`` and ``is``) instead.
|
|
|
|
There were a few concerns with ``==`` as a prefix that kept it from being
|
|
chosen as the prefix in the initial iteration of the PEP:
|
|
|
|
* for common use cases, it's even more visually noisy than ``?``, as a lot of
|
|
folks with :pep:`8` trained aesthetic sensibilities are going to want to put
|
|
a space between it and the following expression, effectively making it a 3
|
|
character prefix instead of 1
|
|
* when used in a mapping pattern, there needs to be a space between the ``:``
|
|
key/value separator and the ``==`` prefix, or the tokeniser will split them
|
|
up incorrectly (getting ``:=`` and ``=`` instead of ``:`` and ``==``)
|
|
* when used in an OR pattern, there needs to be a space between the ``|``
|
|
pattern separator and the ``==`` prefix, or the tokeniser will split them
|
|
up incorrectly (getting ``|=`` and ``=`` instead of ``|`` and ``==``)
|
|
* if used in a :pep:`634` style class pattern, there needs to be a space between
|
|
the ``=`` keyword separator and the ``==`` prefix, or the tokeniser will split
|
|
them up incorrectly (getting ``==`` and ``=`` instead of ``=`` and ``==``)
|
|
|
|
Rather than introducing a completely new symbol, Steven's proposed resolution to
|
|
this verbosity problem was to retain the ability to omit the prefix marker in
|
|
syntactically unambiguous cases.
|
|
|
|
While the idea of omitting the prefix marker was accepted for the second
|
|
revision of the proposal, it was dropped again in the third revision due to
|
|
ambiguity concerns. Instead, the following points apply:
|
|
|
|
* for class patterns, other syntax changes allow equality constraints to be
|
|
written as ``.ATTR == EXPR``, and identity constraints to be written as
|
|
``.ATTR is EXPR``, both of which are quite easy to read
|
|
* for mapping patterns, the extra syntactic noise is just tolerated (at least
|
|
for now)
|
|
* for OR patterns, the extra syntactic noise is just tolerated (at least
|
|
for now). However, `membership constraints`_ may offer a future path to
|
|
reducing the need to combine OR patterns with equality constraints (instead,
|
|
the values to be checked against would be collected as a set, list, or tuple).
|
|
|
|
Given that perspective, :pep:`635`'s arguments against using ``?`` as part of the
|
|
pattern matching syntax held for this proposal as well, and so the PEP was
|
|
amended accordingly.
|
|
|
|
|
|
Using ``__`` as the wildcard pattern marker
|
|
-------------------------------------------
|
|
|
|
:pep:`635` makes a solid case that introducing ``?`` *solely* as a wildcard pattern
|
|
marker would be a bad idea. With the syntax for value constraints changed
|
|
to use existing comparison operations rather than ``?`` and ``?is``, that
|
|
argument holds for this PEP as well.
|
|
|
|
However, as noted by Thomas Wouters in [6]_, :pep:`634`'s choice of ``_`` remains
|
|
problematic as it would likely mean that match patterns would have a *permanent*
|
|
difference from all other parts of Python - the use of ``_`` in software
|
|
internationalisation and at the interactive prompt means that there isn't really
|
|
a plausible path towards using it as a general purpose "skipped binding" marker.
|
|
|
|
``__`` is an alternative "this value is not needed" marker drawn from a Stack
|
|
Overflow answer [7]_ (originally posted by the author of this PEP) on the
|
|
various meanings of ``_`` in existing Python code.
|
|
|
|
This PEP also proposes adopting an implementation technique that limits
|
|
the scope of the associated special casing of ``__`` to the parser: defining a
|
|
new AST node type (``MatchAlways``) specifically for wildcard markers, rather
|
|
than passing it through to the AST as a ``Name`` node.
|
|
|
|
Within the parser, ``__`` still means either a regular name or a wildcard
|
|
marker in a match pattern depending on where you were in the parse tree, but
|
|
within the rest of the compiler, ``Name("__")`` is still a normal variable name,
|
|
while ``MatchAlways()`` is always a wildcard marker in a match pattern.
|
|
|
|
Unlike ``_``, the lack of other use cases for ``__`` means that there would be
|
|
a plausible path towards restoring identifier handling consistency with the rest
|
|
of the language by making ``__`` mean "skip this name binding" everywhere in
|
|
Python:
|
|
|
|
* in the interpreter itself, deprecate loading variables with the name ``__``.
|
|
This would make reading from ``__`` emit a deprecation warning, while writing
|
|
to it would initially be unchanged. To avoid slowing down all name loads, this
|
|
could be handled by having the compiler emit additional code for the
|
|
deprecated name, rather than using a runtime check in the standard name
|
|
loading opcodes.
|
|
* after a suitable number of releases, change the parser to emit
|
|
a new ``SkippedBinding`` AST node for all uses of ``__`` as an assignment
|
|
target, and update the rest of the compiler accordingly
|
|
* consider making ``__`` a true hard keyword rather than a soft keyword
|
|
|
|
This deprecation path couldn't be followed for ``_``, as there's no way for the
|
|
interpreter to distinguish between attempts to read back ``_`` when nominally
|
|
used as a "don't care" marker, and legitimate reads of ``_`` as either an
|
|
i18n text translation function or as the last statement result at the
|
|
interactive prompt.
|
|
|
|
Names starting with double-underscores are also already reserved for use by the
|
|
language, whether that is for compile time constants (i.e. ``__debug__``),
|
|
special methods, or class attribute name mangling, so using ``__`` here would
|
|
be consistent with that existing approach.
|
|
|
|
|
|
Representing patterns explicitly in the Abstract Syntax Tree
|
|
------------------------------------------------------------
|
|
|
|
:pep:`634` doesn't explicitly discuss how match statements should be represented
|
|
in the Abstract Syntax Tree, instead leaving that detail to be defined as part
|
|
of the implementation.
|
|
|
|
As a result, while the reference implementation of :pep:`634` definitely works (and
|
|
formed the basis of the reference implementation of this PEP), it does contain
|
|
a significant design flaw: despite the notes in :pep:`635` that patterns should be
|
|
considered as distinct from expressions, the reference implementation goes ahead
|
|
and represents them in the AST as expression nodes.
|
|
|
|
The result is an AST that isn't very abstract at all: nodes that should be
|
|
compiled completely differently (because they're patterns rather than
|
|
expressions) are represented the same way, and the type system of the
|
|
implementation language (e.g. C for CPython) can't offer any assistance in
|
|
keeping track of which subnodes should be ordinary expressions and which should
|
|
be subpatterns.
|
|
|
|
Rather than continuing with that approach, this PEP has instead defined a new
|
|
explicit "pattern" node in the AST, which allows the patterns and their
|
|
permitted subnodes to be defined explicitly in the AST itself, making the code
|
|
implementing the new feature clearer, and allowing the C compiler to provide
|
|
more assistance in keeping track of when the code generator is dealing with
|
|
patterns or expressions.
|
|
|
|
This change in implementation approach is actually orthogonal to the surface
|
|
syntax changes proposed in this PEP, so it could still be adopted even if the
|
|
rest of the PEP were to be rejected.
|
|
|
|
|
|
Changes to sequence patterns
|
|
----------------------------
|
|
|
|
This PEP makes one notable change to sequence patterns relative to :pep:`634`:
|
|
|
|
* only the square bracket form of sequence pattern is supported. Neither open
|
|
(no delimiters) nor tuple style (parentheses as delimiters) sequence patterns
|
|
are supported.
|
|
|
|
Relative to :pep:`634`, sequence patterns are also significantly affected by the
|
|
change to require explicit qualification of capture patterns and value
|
|
constraints, as it means ``case [a, b, c]:`` must instead be written as
|
|
``case [as a, as b, as c]:`` and ``case [0, 1]:`` must instead be written as
|
|
``case [== 0, == 1]:``.
|
|
|
|
With the syntax for sequence patterns no longer being derived directly from the
|
|
syntax for iterable unpacking, it no longer made sense to keep the syntactic
|
|
flexibility that had been included in the original syntax proposal purely for
|
|
consistency with iterable unpacking.
|
|
|
|
Allowing open and tuple style sequence patterns didn't increase expressivity,
|
|
only ambiguity of intent (especially relative to group patterns), and encouraged
|
|
readers down the path of viewing pattern matching syntax as intrinsically linked
|
|
to assignment target syntax (which the :pep:`634` authors have stated multiple
|
|
times is not a desirable path to have readers take, and a view the author of
|
|
this PEP now shares, despite disagreeing with it originally).
|
|
|
|
|
|
Changes to mapping patterns
|
|
---------------------------
|
|
|
|
This PEP makes two notable changes to mapping patterns relative to :pep:`634`:
|
|
|
|
* value capturing is written as ``KEY as NAME`` rather than as ``KEY: NAME``
|
|
* a wider range of keys are permitted: any "closed expression", rather than
|
|
only literals and attribute references
|
|
|
|
As discussed above, the first change is part of ensuring that all binding
|
|
operations with the target name to the right of a subexpression or pattern
|
|
use the ``as`` keyword.
|
|
|
|
The second change is mostly a matter of simplifying the parser and code
|
|
generator code by reusing the existing expression handling machinery. The
|
|
restriction to closed expressions is designed to help reduce ambiguity as to
|
|
where the key expression ends and the match pattern begins. This mostly allows
|
|
a superset of what :pep:`634` allows, except that complex literals must be written
|
|
in parentheses (at least for now).
|
|
|
|
Adapting :pep:`635`'s mapping pattern examples to the syntax proposed in this PEP::
|
|
|
|
match json_pet:
|
|
case {"type": == "cat", "name" as name, "pattern" as pattern}:
|
|
return Cat(name, pattern)
|
|
case {"type": == "dog", "name" as name, "breed" as breed}:
|
|
return Dog(name, breed)
|
|
case __:
|
|
raise ValueError("Not a suitable pet")
|
|
|
|
def change_red_to_blue(json_obj):
|
|
match json_obj:
|
|
case { 'color': (== 'red' | == '#FF0000') }:
|
|
json_obj['color'] = 'blue'
|
|
case { 'children' as children }:
|
|
for child in children:
|
|
change_red_to_blue(child)
|
|
|
|
For reference, the equivalent :pep:`634` syntax::
|
|
|
|
match json_pet:
|
|
case {"type": "cat", "name": name, "pattern": pattern}:
|
|
return Cat(name, pattern)
|
|
case {"type": "dog", "name": name, "breed": breed}:
|
|
return Dog(name, breed)
|
|
case _:
|
|
raise ValueError("Not a suitable pet")
|
|
|
|
def change_red_to_blue(json_obj):
|
|
match json_obj:
|
|
case { 'color': ('red' | '#FF0000') }:
|
|
json_obj['color'] = 'blue'
|
|
case { 'children': children }:
|
|
for child in children:
|
|
change_red_to_blue(child)
|
|
|
|
|
|
Changes to class patterns
|
|
-------------------------
|
|
|
|
This PEP makes several notable changes to class patterns relative to :pep:`634`:
|
|
|
|
* the syntactic alignment with class instantiation is abandoned as being
|
|
actively misleading and unhelpful. Instead, a new dedicated syntax for
|
|
checking additional attributes is introduced that draws inspiration from
|
|
mapping patterns rather than class instantiation
|
|
* a new dedicated syntax for simple ducktyping that will work for any class
|
|
is introduced
|
|
* the special casing of various builtin and standard library types is
|
|
supplemented by a general check for the existence of a ``__match_args__``
|
|
attribute with the value of ``None``
|
|
|
|
As discussed above, the first change has two purposes:
|
|
|
|
* it's part of ensuring that all binding operations with the target name to the
|
|
right of a subexpression or pattern use the ``as`` keyword. Using ``=`` to
|
|
assign to the right is particularly problematic.
|
|
* it's part of ensuring that all uses of simple names in patterns have a prefix
|
|
that indicates their purpose (in this case, a leading ``.`` to indicate an
|
|
attribute lookup)
|
|
|
|
The syntactic alignment with class instantion was also judged to be unhelpful
|
|
in general, as class patterns are about matching patterns against attributes,
|
|
while class instantiation is about matching call arguments to parameters in
|
|
class constructors, which may not bear much resemblance to the resulting
|
|
instance attributes at all.
|
|
|
|
The second change is intended to make it easier to use pattern matching for the
|
|
"ducktyping" style checks that are already common in Python.
|
|
|
|
The concrete syntax proposal for these patterns then arose from viewing
|
|
instances as mappings of attribute names to values, and combining the attribute
|
|
lookup syntax (``.ATTR``), with the mapping pattern syntax ``{KEY: PATTERN}``
|
|
to give ``cls{.ATTR: PATTERN}``.
|
|
|
|
Allowing ``cls{.ATTR}`` to mean the same thing as ``cls{.ATTR: __}`` was a
|
|
matter of considering the leading ``.`` sufficient to render the name usage
|
|
unambiguous (it's clearly an attribute reference, whereas matching against a variable
|
|
key in a mapping pattern would be arguably ambiguous)
|
|
|
|
The final change just supplements a CPython-internal-only check in the :pep:`634`
|
|
reference implementation by making it the default behaviour that classes get if
|
|
they don't define ``__match_args__`` (the optimised fast path for the builtin
|
|
and standard library types named in :pep:`634` is retained).
|
|
|
|
Adapting the class matching example
|
|
`linked from PEP 635 <https://github.com/gvanrossum/patma/blob/be5969442d0584005492134c3b24eea408709db2/examples/expr.py#L231>`_
|
|
shows that for purely positional class matching, the main impact comes from the
|
|
changes to value constraints and name binding, not from the class matching
|
|
changes::
|
|
|
|
match expr:
|
|
case BinaryOp(== '+', as left, as right):
|
|
return eval_expr(left) + eval_expr(right)
|
|
case BinaryOp(== '-', as left, as right):
|
|
return eval_expr(left) - eval_expr(right)
|
|
case BinaryOp(== '*', as left, as right):
|
|
return eval_expr(left) * eval_expr(right)
|
|
case BinaryOp(== '/', as left, as right):
|
|
return eval_expr(left) / eval_expr(right)
|
|
case UnaryOp(== '+', as arg):
|
|
return eval_expr(arg)
|
|
case UnaryOp(== '-', as arg):
|
|
return -eval_expr(arg)
|
|
case VarExpr(as name):
|
|
raise ValueError(f"Unknown value of: {name}")
|
|
case float() | int():
|
|
return expr
|
|
case __:
|
|
raise ValueError(f"Invalid expression value: {repr(expr)}")
|
|
|
|
For reference, the equivalent :pep:`634` syntax::
|
|
|
|
match expr:
|
|
case BinaryOp('+', left, right):
|
|
return eval_expr(left) + eval_expr(right)
|
|
case BinaryOp('-', left, right):
|
|
return eval_expr(left) - eval_expr(right)
|
|
case BinaryOp('*', left, right):
|
|
return eval_expr(left) * eval_expr(right)
|
|
case BinaryOp('/', left, right):
|
|
return eval_expr(left) / eval_expr(right)
|
|
case UnaryOp('+', arg):
|
|
return eval_expr(arg)
|
|
case UnaryOp('-', arg):
|
|
return -eval_expr(arg)
|
|
case VarExpr(name):
|
|
raise ValueError(f"Unknown value of: {name}")
|
|
case float() | int():
|
|
return expr
|
|
case _:
|
|
raise ValueError(f"Invalid expression value: {repr(expr)}")
|
|
|
|
The changes to the class pattern syntax itself are more relevant when
|
|
checking for named attributes and extracting their values without relying on
|
|
``__match_args__``::
|
|
|
|
match expr:
|
|
case object{.host as host, .port as port}:
|
|
pass
|
|
case object{.host as host}:
|
|
pass
|
|
|
|
Compare this to the :pep:`634` equivalent, where it really isn't clear which names
|
|
are referring to attributes of the match subject and which names are referring
|
|
to local variables::
|
|
|
|
match expr:
|
|
case object(host=host, port=port):
|
|
pass
|
|
case object(host=host):
|
|
pass
|
|
|
|
In this specific case, that ambiguity doesn't matter (since the attribute and
|
|
variable names are the same), but in the general case, knowing which is which
|
|
will be critical to reasoning correctly about the code being read.
|
|
|
|
|
|
Deferred Ideas
|
|
==============
|
|
|
|
Inferred value constraints
|
|
--------------------------
|
|
|
|
As discussed above, this PEP doesn't rule out the possibility of adding
|
|
inferred equality and identity constraints in the future.
|
|
|
|
These could be particularly valuable for literals, as it is quite likely that
|
|
many "magic" strings and numbers with self-evident meanings will be written
|
|
directly into match patterns, rather than being stored in named variables.
|
|
(Think constants like ``None``, or obviously special numbers like ``0`` and
|
|
``1``, or strings where their contents are as descriptive as any variable name,
|
|
rather than cryptic checks against opaque numbers like ``739452``)
|
|
|
|
|
|
Making some required parentheses optional
|
|
-----------------------------------------
|
|
|
|
The PEP currently errs heavily on the side of requiring parentheses in the face
|
|
of potential ambiguity.
|
|
|
|
However, there are a number of cases where it at least arguably goes too far,
|
|
mostly involving AS patterns with an explicit pattern.
|
|
|
|
In any position that requires a closed pattern, AS patterns may end up starting
|
|
with doubled parentheses, as the nested pattern is also required to be a closed
|
|
pattern: ``((OPEN PTRN) as NAME)``
|
|
|
|
Due to the requirement that the subpattern be closed, it should be reasonable
|
|
in many of these cases (e.g. sequence pattern subpatterns) to accept
|
|
``CLOSED_PTRN as NAME`` directly.
|
|
|
|
Further consideration of this point has been deferred, as making required
|
|
parentheses optional is a backwards compatible change, and hence relaxing the
|
|
restrictions later can be considered on a case-by-case basis.
|
|
|
|
|
|
Accepting complex literals as closed expressions
|
|
------------------------------------------------
|
|
|
|
:pep:`634`'s reference implementation includes a lot of special casing of binary
|
|
operations in both the parser and the rest of the compiler in order to accept
|
|
complex literals without accepting arbitrary binary numeric operations on
|
|
literal values.
|
|
|
|
Ideally, this problem would be dealt with at the parser layer, with the parser
|
|
directly emitting a Constant AST node prepopulated with a complex number. If
|
|
that was the way things worked, then complex literals could be accepted through
|
|
a similar mechanism to any other literal.
|
|
|
|
This isn't how complex literals are handled, however. Instead, they're passed
|
|
through to the AST as regular ``BinOp`` nodes, and then the constant folding
|
|
pass on the AST resolves them down to ``Constant`` nodes with a complex value.
|
|
|
|
For the parser to resolve complex literals directly, the compiler would need to
|
|
be able to tell the tokenizer to generate a distinct token type for
|
|
imaginary numbers (e.g. ``INUMBER``), which would then allow the parser to
|
|
handle ``NUMBER + INUMBER`` and ``NUMBER - INUMBER`` separately from other
|
|
binary operations.
|
|
|
|
Alternatively, a new ``ComplexNumber`` AST node type could be defined, which
|
|
would allow the parser to notify the subsequent compiler stages that a
|
|
particular node should specifically be a complex literal, rather than an
|
|
arbitrary binary operation. Then the parser could accept ``NUMBER + NUMBER``
|
|
and ``NUMBER - NUMBER`` for that node, while letting the AST validation for
|
|
``ComplexNumber`` take care of ensuring that the real and imaginary parts of
|
|
the literal were real and imaginary numbers as expected.
|
|
|
|
For now, this PEP has postponed dealing with this question, and instead just
|
|
requires that complex literals be parenthesised in order to be used in value
|
|
constraints and as mapping pattern keys.
|
|
|
|
|
|
Allowing negated constraints in match patterns
|
|
----------------------------------------------
|
|
|
|
With the syntax proposed in this PEP, it isn't permitted to write ``!= expr``
|
|
or ``is not expr`` as a match pattern.
|
|
|
|
Both of these forms have clear potential interpretations as a negated equality
|
|
constraint (i.e. ``x != expr``) and a negated identity constraint
|
|
(i.e. ``x is not expr``).
|
|
|
|
However, it's far from clear either form would come up often enough to justify
|
|
the dedicated syntax, so the possible extension has been deferred pending further
|
|
community experience with match statements.
|
|
|
|
|
|
.. _membership constraints:
|
|
|
|
Allowing membership checks in match patterns
|
|
---------------------------------------------
|
|
|
|
The syntax used for equality and identity constraints would be straightforward
|
|
to extend to membership checks: ``in container``.
|
|
|
|
One downside of the proposals in both this PEP and :pep:`634` is that checking
|
|
for multiple values in the same case doesn't look like any existing container
|
|
membership check in Python::
|
|
|
|
# PEP 634's literal patterns
|
|
match value:
|
|
case 0 | 1 | 2 | 3:
|
|
...
|
|
|
|
# This PEP's equality constraints
|
|
match value:
|
|
case == 0 | == 1 | == 2 | == 3:
|
|
...
|
|
|
|
Allowing inferred equality constraints under this PEP would only make it look
|
|
like the :pep:`634` example, it still wouldn't look like the equivalent ``if``
|
|
statement header (``if value in {0, 1, 2, 3}:``).
|
|
|
|
Membership constraints would provide a more explicit, but still concise, way
|
|
to check if the match subject was present in a container, and it would look
|
|
the same as an ordinary containment check::
|
|
|
|
match value:
|
|
case in {0, 1, 2, 3}:
|
|
...
|
|
case in {one, two, three, four}:
|
|
...
|
|
case in range(4): # It would accept any container, not just literal sets
|
|
...
|
|
|
|
Such a feature would also be readily extensible to allow all kinds of case
|
|
clauses without any further syntax updates, simply by defining ``__contains__``
|
|
appropriately on a custom class definition.
|
|
|
|
However, while this does seem like a useful extension, and a good way to resolve
|
|
this PEP's verbosity problem when combining multiple equality checks in an
|
|
OR pattern, it isn't essential to making match statements a valuable addition
|
|
to the language, so it seems more appropriate to defer it to a separate proposal,
|
|
rather than including it here.
|
|
|
|
|
|
Inferring a default type for instance attribute constraints
|
|
-----------------------------------------------------------
|
|
|
|
The dedicated syntax for instance attribute constraints means that ``object``
|
|
could be omitted from ``object{.ATTR}`` to give ``{.ATTR}`` without introducing
|
|
any syntactic ambiguity (if no class was given, ``object`` would be implied,
|
|
just as it is for the base class list in class definitions).
|
|
|
|
However, it's far from clear saving six characters is worth making it harder to
|
|
visually distinguish mapping patterns from instance attribute patterns, so
|
|
allowing this has been deferred as a topic for possible future consideration.
|
|
|
|
|
|
Avoiding special cases in sequence patterns
|
|
-------------------------------------------
|
|
|
|
Sequence patterns in both this PEP and :pep:`634` currently special case ``str``,
|
|
``bytes``, and ``bytearray`` as specifically *never* matching a sequence
|
|
pattern.
|
|
|
|
This special casing could potentially be removed if we were to define a new
|
|
``collections.abc.AtomicSequence`` abstract base class for types like these,
|
|
where they're conceptually a single item, but still implement the sequence
|
|
protocol to allow random access to their component parts.
|
|
|
|
|
|
Expression syntax to retrieve multiple attributes from an instance
|
|
------------------------------------------------------------------
|
|
|
|
The instance attribute pattern syntax has been designed such that it could
|
|
be used as the basis for a general purpose syntax for retrieving multiple
|
|
attributes from an object in a single expression::
|
|
|
|
host, port = obj{.host, .port}
|
|
|
|
Similar to slice syntax only being allowed inside bracket subscrpts, the
|
|
``.attr`` syntax for naming attributes would only be allowed inside brace
|
|
subscripts.
|
|
|
|
This idea isn't required for pattern matching to be useful, so it isn't part of
|
|
this PEP. However, it's mentioned as a possible path towards making pattern
|
|
matching feel more integrated into the rest of the language, rather than
|
|
existing forever in its own completely separated world.
|
|
|
|
|
|
Expression syntax to retrieve multiple attributes from an instance
|
|
------------------------------------------------------------------
|
|
|
|
If the brace subscript syntax were to be accepted for instance attribute
|
|
pattern matching, and then subsequently extended to offer general purpose
|
|
extraction of multiple attributes, then it could be extended even further to
|
|
allow for retrieval of multiple items from containers based on the syntax
|
|
used for mapping pattern matching::
|
|
|
|
host, port = obj{"host", "port"}
|
|
first, last = obj{0, -1}
|
|
|
|
Again, this idea isn't required for pattern matching to be useful, so it isn't
|
|
part of this PEP. As with retrieving multiple attributes, however, it is
|
|
included as an example of the proposed pattern matching syntax inspiring ideas
|
|
for making object deconstruction easier in general.
|
|
|
|
|
|
Rejected Ideas
|
|
==============
|
|
|
|
Restricting permitted expressions in value constraints and mapping pattern keys
|
|
-------------------------------------------------------------------------------
|
|
|
|
While it's entirely technically possible to restrict the kinds of expressions
|
|
permitted in value constraints and mapping pattern keys to just attribute
|
|
lookups and constant literals (as :pep:`634` does), there isn't any clear runtime
|
|
value in doing so, so this PEP proposes allowing any kind of primary expression
|
|
(primary expressions are an existing node type in the grammar that includes
|
|
things like literals, names, attribute lookups, function calls, container
|
|
subscripts, parenthesised groups, etc), as well as high precedence unary
|
|
operations (``+``, ``-``, ``~``) on primary expressions.
|
|
|
|
While :pep:`635` does emphasise several times that literal patterns and value
|
|
patterns are not full expressions, it doesn't ever articulate a concrete benefit
|
|
that is obtained from that restriction (just a theoretical appeal to it being
|
|
useful to separate static checks from dynamic checks, which a code style
|
|
tool could still enforce, even if the compiler itself is more permissive).
|
|
|
|
The last time we imposed such a restriction was for decorator expressions and
|
|
the primary outcome of that was that users had to put up with years of awkward
|
|
syntactic workarounds (like nesting arbitrary expressions inside function calls
|
|
that just returned their argument) to express the behaviour they wanted before
|
|
the language definition was finally updated to allow arbitrary expressions and
|
|
let users make their own decisions about readability.
|
|
|
|
The situation in :pep:`634` that bears a resemblance to the situation with decorator
|
|
expressions is that arbitrary expressions are technically supported in value
|
|
patterns, they just require awkward workarounds where either all the values to
|
|
match need to be specified in a helper class that is placed before the match
|
|
statement::
|
|
|
|
# Allowing arbitrary match targets with PEP 634's value pattern syntax
|
|
class mt:
|
|
value = func()
|
|
match expr:
|
|
case (_, mt.value):
|
|
... # Handle the case where 'expr[1] == func()'
|
|
|
|
Or else they need to be written as a combination of a capture pattern and a
|
|
guard expression::
|
|
|
|
# Allowing arbitrary match targets with PEP 634's guard expressions
|
|
match expr:
|
|
case (_, _matched) if _matched == func():
|
|
... # Handle the case where 'expr[1] == func()'
|
|
|
|
This PEP proposes skipping requiring any such workarounds, and instead
|
|
supporting arbitrary value constraints from the start::
|
|
|
|
match expr:
|
|
case (__, == func()):
|
|
... # Handle the case where 'expr == func()'
|
|
|
|
Whether actually writing that kind of code is a good idea would be a topic for
|
|
style guides and code linters, not the language compiler.
|
|
|
|
In particular, if static analysers can't follow certain kinds of dynamic checks,
|
|
then they can limit the permitted expressions at analysis time, rather than the
|
|
compiler restricting them at compile time.
|
|
|
|
There are also some kinds of expressions that are almost certain to give
|
|
nonsensical results (e.g. ``yield``, ``yield from``, ``await``) due to the
|
|
pattern caching rule, where the number of times the constraint expression
|
|
actually gets evaluated will be implementation dependent. Even here, the PEP
|
|
takes the view of letting users write nonsense if they really want to.
|
|
|
|
Aside from the recently updated decorator expressions, another situation where
|
|
Python's formal syntax offers full freedom of expression that is almost never
|
|
used in practice is in ``except`` clauses: the exceptions to match against
|
|
almost always take the form of a simple name, a dotted name, or a tuple of
|
|
those, but the language grammar permits arbitrary expressions at that point.
|
|
This is a good indication that Python's user base can be trusted to
|
|
take responsibility for finding readable ways to use permissive language
|
|
features, by avoiding writing hard to read constructs even when they're
|
|
permitted by the compiler.
|
|
|
|
This permissiveness comes with a real concrete benefit on the implementation
|
|
side: dozens of lines of match statement specific code in the compiler is
|
|
replaced by simple calls to the existing code for compiling expressions
|
|
(including in the AST validation pass, the AST optimization pass, the symbol
|
|
table analysis pass, and the code generation pass). This implementation
|
|
benefit would accrue not just to CPython, but to every other Python
|
|
implementation looking to add match statement support.
|
|
|
|
|
|
Requiring the use of constraint prefix markers for mapping pattern keys
|
|
-----------------------------------------------------------------------
|
|
|
|
The initial (unpublished) draft of this proposal suggested requiring mapping
|
|
pattern keys be value constraints, just as :pep:`634` requires that they be valid
|
|
literal or value patterns::
|
|
|
|
import constants
|
|
|
|
match config:
|
|
case {== "route": route}:
|
|
process_route(route)
|
|
case {== constants.DEFAULT_PORT: sub_config, **rest}:
|
|
process_config(sub_config, rest)
|
|
|
|
However, the extra characters were syntactically noisy and unlike its use in
|
|
value constraints (where it distinguishes them from non-pattern expressions),
|
|
the prefix doesn't provide any additional information here that isn't already
|
|
conveyed by the expression's position as a key within a mapping pattern.
|
|
|
|
Accordingly, the proposal was simplified to omit the marker prefix from mapping
|
|
pattern keys.
|
|
|
|
This omission also aligns with the fact that containers may incorporate both
|
|
identity and equality checks into their lookup process - they don't purely
|
|
rely on equality checks, as would be incorrectly implied by the use of the
|
|
equality constraint prefix.
|
|
|
|
|
|
Allowing the key/value separator to be omitted for mapping value constraints
|
|
----------------------------------------------------------------------------
|
|
|
|
Instance attribute patterns allow the ``:`` separator to be omitted when
|
|
writing attribute value constraints like ``case object{.attr == expr}``.
|
|
|
|
Offering a similar shorthand for mapping value constraints was considered, but
|
|
permitting it allows thoroughly baffling constructs like ``case {0 == 0}:``
|
|
where the compiler knows this is the key ``0`` with the value constraint
|
|
``== 0``, but a human reader sees the tautological comparison operation
|
|
``0 == 0``. With the key/value separator included, the intent is more obvious to
|
|
a human reader as well: ``case {0: == 0}:``
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
A draft reference implementation for this PEP [3]_ has been derived from Brandt
|
|
Bucher's reference implementation for :pep:`634` [4]_.
|
|
|
|
Relative to the text of this PEP, the draft reference implementation has not
|
|
yet complemented the special casing of several builtin and standard library
|
|
types in ``MATCH_CLASS`` with the more general check for ``__match_args__``
|
|
being set to ``None``. Class defined patterns also currently still accept
|
|
classes that don't define ``__match_args__``.
|
|
|
|
All other modified patterns have been updated to follow this PEP rather than
|
|
:pep:`634`.
|
|
|
|
Unparsing for match patterns has not yet been migrated to the updated v3 AST.
|
|
|
|
The AST validator for match patterns has not yet been implemented.
|
|
|
|
The AST validator in general has not yet been reviewed to ensure that it is
|
|
checking that only expression nodes are being passed in where expression nodes
|
|
are expected.
|
|
|
|
The examples in this PEP have not yet been converted to test cases, so could
|
|
plausibly contain typos and other errors.
|
|
|
|
Several of the old :pep:`634` tests are still to be converted to new SyntaxError
|
|
tests.
|
|
|
|
The documentation has not yet been updated.
|
|
|
|
|
|
Acknowledgments
|
|
===============
|
|
|
|
The :pep:`622` and :pep:`634`/:pep:`635`/:pep:`636` authors, as the proposal in
|
|
this PEP is merely
|
|
an attempt to improve the readability of an already well-constructed idea by
|
|
proposing that starting with a more explicit syntax and potentially introducing
|
|
syntactic shortcuts for particularly common operations later is a better option
|
|
than attempting to *only* define the shortcut version. For areas of the
|
|
specification where the two PEPs are the same (or at least very similar), the
|
|
text describing the intended behaviour in this PEP is often derived directly
|
|
from the :pep:`634` text.
|
|
|
|
Steven D'Aprano, who made a compelling case that the key goals of this PEP could
|
|
be achieved by using existing comparison tokens to tell the ability to override
|
|
the compiler when our guesses as to "what most users will want most of the time"
|
|
are inevitably incorrect for at least some users some of the time, and retaining
|
|
some of :pep:`634`'s syntactic sugar (with a slightly different semantic definition)
|
|
to obtain the same level of brevity as :pep:`634` in most situations. (Paul
|
|
Sokolosvsky also independently suggested using ``==`` instead of ``?`` as a
|
|
more easily understood prefix for equality constraints).
|
|
|
|
Thomas Wouters, whose publication of :pep:`640` and public review of the structured
|
|
pattern matching proposals persuaded the author of this PEP to continue
|
|
advocating for a wildcard pattern syntax that a future PEP could plausibly turn
|
|
into a hard keyword that always skips binding a reference in any location a
|
|
simple name is expected, rather than continuing indefinitely as the match
|
|
pattern specific soft keyword that is proposed here.
|
|
|
|
Joao Bueno and Jim Jewett for nudging the PEP author to take a closer look at
|
|
the proposed syntax for subelement capturing within class patterns and mapping
|
|
patterns (particularly the problems with "capturing to the right"). This
|
|
review is what prompted the significant changes between v2 and v3 of the
|
|
proposal.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] Post explaining the syntactic novelties in PEP 622
|
|
https://mail.python.org/archives/list/python-dev@python.org/message/2VRPDW4EE243QT3QNNCO7XFZYZGIY6N3/>
|
|
|
|
.. [2] Declined pull request proposing to list this as a Rejected Idea in PEP 622
|
|
https://github.com/python/peps/pull/1564
|
|
|
|
.. [3] In-progress reference implementation for this PEP
|
|
https://github.com/ncoghlan/cpython/tree/pep-642-constraint-patterns
|
|
|
|
.. [4] PEP 634 reference implementation
|
|
https://github.com/python/cpython/pull/22917
|
|
|
|
.. [5] Steven D'Aprano's cogent criticism of the first published iteration of this PEP
|
|
https://mail.python.org/archives/list/python-dev@python.org/message/BTHFWG6MWLHALOD6CHTUFPHAR65YN6BP/
|
|
|
|
.. [6] Thomas Wouter's initial review of the structured pattern matching proposals
|
|
https://mail.python.org/archives/list/python-dev@python.org/thread/4SBR3J5IQUYE752KR7C6432HNBSYKC5X/
|
|
|
|
.. [7] Stack Overflow answer regarding the use cases for ``_`` as an identifier
|
|
https://stackoverflow.com/questions/5893163/what-is-the-purpose-of-the-single-underscore-variable-in-python/5893946#5893946
|
|
|
|
.. [8] Pre-publication draft of "Precise Semantics for Pattern Matching"
|
|
https://github.com/markshannon/pattern-matching/blob/master/precise_semantics.rst
|
|
|
|
.. [9] Kohn et al., Dynamic Pattern Matching with Python
|
|
https://gvanrossum.github.io/docs/PyPatternMatching.pdf
|
|
|
|
|
|
.. _642-appendix-a:
|
|
|
|
Appendix A -- Full Grammar
|
|
==========================
|
|
|
|
Here is the full modified grammar for ``match_stmt``, replacing Appendix A
|
|
in :pep:`634`.
|
|
|
|
Notation used beyond standard EBNF is as per :pep:`534`:
|
|
|
|
- ``'KWD'`` denotes a hard keyword
|
|
- ``"KWD"`` denotes a soft keyword
|
|
- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*``
|
|
- ``!RULE`` is a negative lookahead assertion
|
|
|
|
::
|
|
|
|
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
|
|
subject_expr:
|
|
| star_named_expression ',' [star_named_expressions]
|
|
| named_expression
|
|
case_block: "case" (guarded_pattern | open_pattern) ':' block
|
|
|
|
guarded_pattern: closed_pattern 'if' named_expression
|
|
open_pattern: # Pattern may use multiple tokens with no closing delimiter
|
|
| as_pattern
|
|
| or_pattern
|
|
|
|
as_pattern: [closed_pattern] pattern_as_clause
|
|
as_pattern_with_inferred_wildcard: pattern_as_clause
|
|
pattern_as_clause: 'as' pattern_capture_target
|
|
pattern_capture_target: !"__" NAME !('.' | '(' | '=')
|
|
|
|
or_pattern: '|'.simple_pattern+
|
|
|
|
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
|
|
| closed_pattern
|
|
| value_constraint
|
|
|
|
value_constraint:
|
|
| eq_constraint
|
|
| id_constraint
|
|
|
|
eq_constraint: '==' closed_expr
|
|
id_constraint: 'is' closed_expr
|
|
|
|
closed_expr: # Require a single token or a closing delimiter in expression
|
|
| primary
|
|
| closed_factor
|
|
|
|
closed_factor: # "factor" is the main grammar node for these unary ops
|
|
| '+' primary
|
|
| '-' primary
|
|
| '~' primary
|
|
|
|
closed_pattern: # Require a single token or a closing delimiter in pattern
|
|
| wildcard_pattern
|
|
| group_pattern
|
|
| structural_constraint
|
|
|
|
wildcard_pattern: "__"
|
|
|
|
group_pattern: '(' open_pattern ')'
|
|
|
|
structural_constraint:
|
|
| sequence_constraint
|
|
| mapping_constraint
|
|
| attrs_constraint
|
|
| class_constraint
|
|
|
|
sequence_constraint: '[' [sequence_constraint_elements] ']'
|
|
sequence_constraint_elements: ','.sequence_constraint_element+ ','?
|
|
sequence_constraint_element:
|
|
| star_pattern
|
|
| simple_pattern
|
|
| as_pattern_with_inferred_wildcard
|
|
star_pattern: '*' (pattern_as_clause | wildcard_pattern)
|
|
|
|
mapping_constraint: '{' [mapping_constraint_elements] '}'
|
|
mapping_constraint_elements: ','.key_value_constraint+ ','?
|
|
key_value_constraint:
|
|
| closed_expr pattern_as_clause
|
|
| closed_expr ':' simple_pattern
|
|
| double_star_capture
|
|
double_star_capture: '**' pattern_as_clause
|
|
|
|
attrs_constraint:
|
|
| name_or_attr '{' [attrs_constraint_elements] '}'
|
|
name_or_attr: attr | NAME
|
|
attr: name_or_attr '.' NAME
|
|
attrs_constraint_elements: ','.attr_value_constraint+ ','?
|
|
attr_value_constraint:
|
|
| '.' NAME pattern_as_clause
|
|
| '.' NAME value_constraint
|
|
| '.' NAME ':' simple_pattern
|
|
| '.' NAME
|
|
|
|
class_constraint:
|
|
| name_or_attr '(' ')'
|
|
| name_or_attr '(' positional_patterns ','? ')'
|
|
| name_or_attr '(' class_constraint_attrs ')'
|
|
| name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')'
|
|
positional_patterns: ','.positional_pattern+
|
|
positional_pattern:
|
|
| simple_pattern
|
|
| as_pattern_with_inferred_wildcard
|
|
class_constraint_attrs:
|
|
| '**' '{' [attrs_constraint_elements] '}'
|
|
|
|
|
|
.. _642-appendix-b:
|
|
|
|
Appendix B: Summary of Abstract Syntax Tree changes
|
|
===================================================
|
|
|
|
The following new nodes are added to the AST by this PEP::
|
|
|
|
stmt = ...
|
|
| ...
|
|
| Match(expr subject, match_case* cases)
|
|
| ...
|
|
...
|
|
|
|
match_case = (pattern pattern, expr? guard, stmt* body)
|
|
|
|
pattern = MatchAlways
|
|
| MatchValue(matchop op, expr value)
|
|
| MatchSequence(pattern* patterns)
|
|
| MatchMapping(expr* keys, pattern* patterns)
|
|
| MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
|
|
| MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
|
|
|
|
| MatchRestOfSequence(identifier? target)
|
|
-- A NULL entry in the MatchMapping key list handles capturing extra mapping keys
|
|
|
|
| MatchAs(pattern? pattern, identifier target)
|
|
| MatchOr(pattern* patterns)
|
|
|
|
attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)
|
|
|
|
matchop = EqCheck | IdCheck
|
|
|
|
|
|
.. _642-appendix-c:
|
|
|
|
Appendix C: Summary of changes relative to PEP 634
|
|
==================================================
|
|
|
|
The overall ``match``/``case`` statement syntax and the guard expression syntax
|
|
remain the same as they are in :pep:`634`.
|
|
|
|
Relative to :pep:`634` this PEP makes the following key changes:
|
|
|
|
* a new ``pattern`` type is defined in the AST, rather than reusing the ``expr``
|
|
type for patterns
|
|
* the new ``MatchAs`` and ``MatchOr`` AST nodes are moved from the ``expr``
|
|
type to the ``pattern`` type
|
|
* the wildcard pattern changes from ``_`` (single underscore) to ``__`` (double
|
|
underscore), and gains a dedicated ``MatchAlways`` node in the AST
|
|
* due to ambiguity of intent, value patterns and literal patterns are removed
|
|
* a new expression category is introduced: "closed expressions"
|
|
* closed expressions are either primary expressions, or a closed expression
|
|
preceded by one of the high precedence unary operators (``+``, ``-``, ``~``)
|
|
* a new pattern type is introduced: "value constraint patterns"
|
|
* value constraints have a dedicated ``MatchValue`` AST node rather than
|
|
allowing a combination of ``Constant`` (literals), ``UnaryOp``
|
|
(negative numbers), ``BinOp`` (complex numbers), and ``Attribute`` (attribute
|
|
lookups)
|
|
* value constraint patterns are either equality constraints or identity constraints
|
|
* equality constraints use ``==`` as a prefix marker on an otherwise
|
|
arbitrary closed expression: ``== EXPR``
|
|
* identity constraints use ``is`` as a prefix marker on an otherwise
|
|
arbitrary closed expression: ``is EXPR``
|
|
* due to ambiguity of intent, capture patterns are removed. All capture operations
|
|
use the ``as`` keyword (even in sequence matching) and are represented in the
|
|
AST as either ``MatchAs`` or ``MatchRestOfSequence`` nodes.
|
|
* to reduce verbosity in AS patterns, ``as NAME`` is permitted, with the same
|
|
meaning as ``__ as NAME``
|
|
* sequence patterns change to *require* the use of square brackets, rather than
|
|
offering the same syntactic flexibility as assignment targets (assignment
|
|
statements allow iterable unpacking to be indicated by any use of a tuple
|
|
separated target, with or without surrounding parentheses or square brackets)
|
|
* sequence patterns gain a dedicated ``MatchSequence`` AST node rather than
|
|
reusing ``List``
|
|
* mapping patterns change to allow arbitrary closed expressions as keys
|
|
* mapping patterns gain a dedicated ``MatchMapping`` AST node rather than
|
|
reusing ``Dict``
|
|
* to reduce verbosity in mapping patterns, ``KEY : __ as NAME`` may be shortened
|
|
to ``KEY as NAME``
|
|
* class patterns no longer use individual keyword argument syntax for attribute
|
|
matching. Instead they use double-star syntax, along with a variant on mapping
|
|
pattern syntax with a dot prefix on the attribute names
|
|
* class patterns gain a dedicated ``MatchClass`` AST node rather than
|
|
reusing ``Call``
|
|
* to reduce verbosity, class attribute matching allows ``:`` to be omitted when
|
|
the pattern to be matched starts with ``==``, ``is``, or ``as``
|
|
* class patterns treat any class that sets ``__match_args__`` to ``None`` as
|
|
accepting a single positional pattern that is matched against the entire
|
|
object (avoiding the special casing required in :pep:`634`)
|
|
* class patterns raise ``TypeError`` when used with an object that does not
|
|
define ``__match_args__``
|
|
* dedicated syntax for ducktyping is added, such that ``case cls{...}:`` is
|
|
roughly equivalent to ``case cls(**{...}):``, but skips the check for the
|
|
existence of ``__match_args__``. This pattern also has a dedicated AST node,
|
|
``MatchAttrs``
|
|
|
|
Note that postponing literal patterns also makes it possible to postpone the
|
|
question of whether we need an "INUMBER" token in the tokeniser for imaginary
|
|
literals. Without it, the parser can't distinguish complex literals from other
|
|
binary addition and subtraction operations on constants, so proposals like
|
|
:pep:`634` have to do work in later compilation steps to check for correct usage.
|
|
|
|
|
|
.. _642-appendix-d:
|
|
|
|
Appendix D: History of changes to this proposal
|
|
===============================================
|
|
|
|
The first published iteration of this proposal mostly followed :pep:`634`, but
|
|
suggested using ``?EXPR`` for equality constraints and ``?is EXPR`` for
|
|
identity constraints rather than :pep:`634`'s value patterns and literal patterns.
|
|
|
|
The second published iteration mostly adopted a counter-proposal from Steven
|
|
D'Aprano that kept the :pep:`634` style inferred constraints in many situations,
|
|
but also allowed the use of ``== EXPR`` for explicit equality constraints, and
|
|
``is EXPR`` for explicit identity constraints.
|
|
|
|
The third published (and current) iteration dropped inferred patterns entirely,
|
|
in an attempt to resolve the concerns with the fact that the patterns
|
|
``case {key: NAME}:`` and ``case cls(attr=NAME):`` would both bind ``NAME``
|
|
despite it appearing to the right of another subexpression without using the
|
|
``as`` keyword. The revised proposal also eliminates the possibility of writing
|
|
``case TARGET1 as TARGET2:``, which would bind to both of the given names. Of
|
|
those changes, the most concerning was ``case cls(attr=TARGET_NAME):``, since it
|
|
involved the use of ``=`` with the binding target on the right, the exact
|
|
opposite of what happens in assignment statements, function calls, and
|
|
function signature declarations.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|