2117 lines
88 KiB
ReStructuredText
2117 lines
88 KiB
ReStructuredText
PEP: 642
|
||
Title: Explicit Pattern Syntax for Structural Pattern Matching
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Nick Coghlan <ncoghlan@gmail.com>
|
||
BDFL-Delegate:
|
||
Discussions-To: Python-Dev <python-dev@python.org>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Requires: 634
|
||
Created: 26-Sep-2020
|
||
Python-Version: 3.10
|
||
Post-History: 31-Oct-2020, 8-Nov-2020, 3-Jan-2021
|
||
Resolution:
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP covers an alternative syntax proposal for PEP 634's structural pattern
|
||
matching that requires explicit prefixes on all capture patterns and value
|
||
constraints. It also proposes a new dedicated syntax for instance attribute
|
||
patterns that aligns more closely with the proposed mapping pattern syntax.
|
||
|
||
While the result is necessarily more verbose than the proposed syntax in
|
||
PEP 634, it is still significantly less verbose than the status quo.
|
||
|
||
As an example, the following match statement would extract "host" and "port"
|
||
details from a 2 item sequence, a mapping with "host" and "port" keys, any
|
||
object with "host" and "port" attributes, or a "host:port" string, treating
|
||
the "port" as optional in the latter three cases::
|
||
|
||
port = DEFAULT_PORT
|
||
match expr:
|
||
case [as host, as port]:
|
||
pass
|
||
case {"host" as host, "port" as port}:
|
||
pass
|
||
case {"host" as host}:
|
||
pass
|
||
case object{.host as host, .port as port}:
|
||
pass
|
||
case object{.host as host}:
|
||
pass
|
||
case str{} as addr:
|
||
host, __, optional_port = addr.partition(":")
|
||
if optional_port:
|
||
port = optional_port
|
||
case __ as m:
|
||
raise TypeError(f"Unknown address format: {m!r:.200}")
|
||
port = int(port)
|
||
|
||
|
||
At a high level, this PEP proposes to categorise the different available pattern
|
||
types as follows:
|
||
|
||
* wildcard pattern: ``__``
|
||
* group patterns: ``(PTRN)``
|
||
* value constraint patterns:
|
||
* equality constraints: ``== EXPR``
|
||
* identity contraints: ``is EXPR``
|
||
* structural constraint patterns:
|
||
* sequence constraint patterns: ``[PTRN, as NAME, PTRN as NAME]``
|
||
* mapping constraint patterns: ``{EXPR: PTRN, EXPR as NAME}``
|
||
* instance attribute constraint patterns:
|
||
``CLS{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME}``
|
||
* class defined constraint patterns:
|
||
``CLS(PTRN, PTRN, **{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME})``
|
||
* OR patterns: ``PTRN | PTRN | PTRN``
|
||
* AS patterns: ``PTRN as NAME`` (omitting the pattern implies ``__``)
|
||
|
||
The intent of this approach is to:
|
||
|
||
* allow an initial form of pattern matching to be developed and released without
|
||
needing to decide up front on the best default options for handling bare names,
|
||
attribute lookups, and literal values
|
||
* ensure that pattern matching is defined explicitly at the Abstract Syntax Tree
|
||
level, allowing the specifications of the semantics and the surface syntax for
|
||
pattern matching to be clearly separated
|
||
* define a clear and concise "ducktyping" syntax that could potentially be
|
||
adopted in ordinary expressions as a way to more easily retrieve a tuple
|
||
containing multiple attributes from the same object
|
||
|
||
Relative to PEP 634, the proposal also deliberately eliminates any syntax that
|
||
"binds to the right" without using the ``as`` keyword (using capture patterns
|
||
in PEP 634's mapping patterns and class patterns) or binds to both the left and
|
||
the right in the same pattern (using PEP 634's capture patterns with AS patterns)
|
||
|
||
|
||
Relationship with other PEPs
|
||
============================
|
||
|
||
This PEP both depends on and competes with PEP 634 - the PEP author agrees that
|
||
match statements would be a sufficiently valuable addition to the language to
|
||
be worth the additional complexity that they add to the learning process, but
|
||
disagrees with the idea that "simple name vs literal or attribute lookup"
|
||
really offers an adequate syntactic distinction between name binding and value
|
||
lookup operations in match patterns (at least for Python).
|
||
|
||
This PEP agrees with the spirit of PEP 640 (that the chosen wildcard pattern to
|
||
skip a name binding should be supported everywhere, not just in match patterns),
|
||
but is now proposing a different spelling for the wildcard syntax (``__`` rather
|
||
than ``?``). As such, it competes with PEP 640 as written, but would complement
|
||
a proposal to deprecate the use of ``__`` as an ordinary identifier and instead
|
||
turn it into a general purpose wildcard marker that always skips making a new
|
||
local variable binding.
|
||
|
||
While it has not yet been put forward as a PEP, Mark Shannon has a pre-PEP draft
|
||
[8_] expressing several concerns about the runtime semantics of the pattern
|
||
matching proposal in PEP 634. This PEP is somewhat complementary to that one, as
|
||
even though this PEP is mostly about surface syntax changes rather than major
|
||
semantic changes, it does propose that the Abstract Syntax Tree definition be
|
||
made more explicit to better separate the details of the surface syntax from the
|
||
semantics of the code generation step. There is one specific idea in that pre-PEP
|
||
draft that this PEP explicitly rejects: the idea that the different kinds of
|
||
matching are mutually exclusive. It's entirely possible for the same value to
|
||
match different kinds of structural pattern, and which one takes precedence will
|
||
intentionally be governed by the order of the cases in the match statement.
|
||
|
||
|
||
Motivation
|
||
==========
|
||
|
||
The original PEP 622 (which was later split into PEP 634, PEP 635, and PEP 636)
|
||
incorporated an unstated but essential assumption in its syntax design: that
|
||
neither ordinary expressions *nor* the existing assignment target syntax provide
|
||
an adequate foundation for the syntax used in match patterns.
|
||
|
||
While the PEP didn't explicitly state this assumption, one of the PEP authors
|
||
explained it clearly on python-dev [1_]:
|
||
|
||
The actual problem that I see is that we have different cultures/intuitions
|
||
fundamentally clashing here. In particular, so many programmers welcome
|
||
pattern matching as an "extended switch statement" and find it therefore
|
||
strange that names are binding and not expressions for comparison. Others
|
||
argue that it is at odds with current assignment statements, say, and
|
||
question why dotted names are _/not/_ binding. What all groups seem to
|
||
have in common, though, is that they refer to _/their/_ understanding and
|
||
interpretation of the new match statement as 'consistent' or 'intuitive'
|
||
--- naturally pointing out where we as PEP authors went wrong with our
|
||
design.
|
||
|
||
But here is the catch: at least in the Python world, pattern matching as
|
||
proposed by this PEP is an unprecedented and new way of approaching a common
|
||
problem. It is not simply an extension of something already there. Even
|
||
worse: while designing the PEP we found that no matter from which angle you
|
||
approach it, you will run into issues of seeming 'inconsistencies' (which is
|
||
to say that pattern matching cannot be reduced to a 'linear' extension of
|
||
existing features in a meaningful way): there is always something that goes
|
||
fundamentally beyond what is already there in Python. That's why I argue
|
||
that arguments based on what is 'intuitive' or 'consistent' just do not
|
||
make sense _/in this case/_.
|
||
|
||
The first iteration of this PEP was then born out of an attempt to show that the
|
||
second assertion was not accurate, and that match patterns could be treated
|
||
as a variation on assignment targets without leading to inherent contradictions.
|
||
(An earlier PR submitted to list this option in the "Rejected Ideas" section
|
||
of the original PEP 622 had previously been declined [2_]).
|
||
|
||
However, the review process for this PEP strongly suggested that not only did
|
||
the contradictions that Tobias mentioned in his email exist, but they were also
|
||
concerning enough to cast doubts on the syntax proposal presented in PEP 634.
|
||
Accordingly, this PEP was changed to go even further than PEP 634, and largely
|
||
abandon alignment between the sequence matching syntax and the existing iterable
|
||
unpacking syntax (effectively answering "Not really, as least as far as the
|
||
exact syntax is concerned" to the first question raised in the DLS'20 paper
|
||
[9_]: "Can we extend a feature like iterable unpacking to work for more general
|
||
object and data layouts?").
|
||
|
||
This resulted in a complete reversal of the goals of the PEP: rather than
|
||
attempting to emphasise the similarities between assignment and pattern matching,
|
||
the PEP now attempts to make sure that assignment target syntax isn't being
|
||
reused *at all*, reducing the likelihood of incorrect inferences being drawn
|
||
about the new construct based on experience with existing ones.
|
||
|
||
Finally, before completing the 3rd iteration of the proposal (which dropped
|
||
inferred patterns entirely), the PEP author spent quite a bit of time reflecting
|
||
on the following entries in PEP 20:
|
||
|
||
* Explicit is better than implicit.
|
||
* Special cases aren't special enough to break the rules.
|
||
* In the face of ambiguity, refuse the temptation to guess.
|
||
|
||
If we start with an explicit syntax, we can always add syntactic shortcuts later
|
||
(e.g. consider the recent proposals to add shortcuts for ``Union`` and
|
||
``Optional`` type hints only after years of experience with the original more
|
||
verbose forms), while if we start out with only the abbreviated forms,
|
||
then we don't have any real way to revisit those decisions in a future release.
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
This PEP retains the overall `match`/`case` statement structure and semantics
|
||
from PEP 634, but proposes multiple changes that mean that user intent is
|
||
explicitly specified in the concrete syntax rather than needing to be inferred
|
||
from the pattern matching context.
|
||
|
||
In the proposed Abstract Syntax Tree, the semantics are also always explicit,
|
||
with no inference required.
|
||
|
||
|
||
The Match Statement
|
||
-------------------
|
||
|
||
Surface syntax::
|
||
|
||
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
|
||
subject_expr:
|
||
| star_named_expression ',' star_named_expressions?
|
||
| named_expression
|
||
case_block: "case" (guarded_pattern | open_pattern) ':' block
|
||
|
||
guarded_pattern: closed_pattern 'if' named_expression
|
||
|
||
open_pattern:
|
||
| as_pattern
|
||
| or_pattern
|
||
|
||
closed_pattern:
|
||
| wildcard_pattern
|
||
| group_pattern
|
||
| structural_constraint
|
||
|
||
Abstract syntax::
|
||
|
||
Match(expr subject, match_case* cases)
|
||
match_case = (pattern pattern, expr? guard, stmt* body)
|
||
|
||
|
||
The rules ``star_named_expression``, ``star_named_expressions``,
|
||
``named_expression`` and ``block`` are part of the `standard Python
|
||
grammar <https://docs.python.org/3.10/reference/grammar.html>`_.
|
||
|
||
Open patterns are patterns which consist of multiple tokens, and aren't
|
||
necessarily terminated by a closing delimiter (for example, ``__ as x``,
|
||
``int() | bool()``). To avoid ambiguity for human readers, their usage is
|
||
restricted to top level patterns and to group patterns (which are patterns
|
||
surrounded by parentheses).
|
||
|
||
Closed patterns are patterns which either consist of a single token
|
||
(i.e. ``__``), or else have a closing delimeter as a required part of their
|
||
syntax (e.g. ``[as x, as y]``, ``object{.x as x, .y as y}``).
|
||
|
||
As in PEP 634, the ``match`` and ``case`` keywords are soft keywords, i.e. they
|
||
are not reserved words in other grammatical contexts (including at the
|
||
start of a line if there is no colon where expected). This means
|
||
that they are recognized as keywords when part of a match
|
||
statement or case block only, and are allowed to be used in all
|
||
other contexts as variable or argument names.
|
||
|
||
Unlike PEP 634, patterns are explicitly defined as a new kind of node in the
|
||
abstract syntax tree - even when surface syntax is shared with existing
|
||
expression nodes, a distinct abstract node is emitted by the parser.
|
||
|
||
For context, ``match_stmt`` is a new alternative for
|
||
``compound_statement`` in the surface syntax and ``Match`` is a new
|
||
alternative for ``stmt`` in the abstract syntax.
|
||
|
||
|
||
Match Semantics
|
||
^^^^^^^^^^^^^^^
|
||
|
||
This PEP largely retains the overall pattern matching semantics proposed in
|
||
PEP 634.
|
||
|
||
The proposed syntax for patterns changes significantly, and is discussed in
|
||
detail below.
|
||
|
||
There are also some proposed changes to the semantics of class defined
|
||
constraints (class patterns in PEP 634) to eliminate the need to special case
|
||
any builtin types (instead, the introduction of dedicated syntax for instance
|
||
attribute constraints allows the behaviour needed by those builtin types to be
|
||
specified as applying to any type that sets ``__match_args__`` to ``None``)
|
||
|
||
|
||
.. _guards:
|
||
|
||
Guards
|
||
^^^^^^
|
||
|
||
This PEP retains the guard clause semantics proposed in PEP 634.
|
||
|
||
However, the syntax is changed slightly to require that when a guard clause
|
||
is present, the case pattern must be a *closed* pattern.
|
||
|
||
This makes it clearer to the reader where the pattern ends and the guard clause
|
||
begins. (This is mainly a potential problem with OR patterns, where the guard
|
||
clause looks kind of like the start of a conditional expression in the final
|
||
pattern. Actually doing that isn't legal syntax, so there's no ambiguity as far
|
||
as the compiler is concerned, but the distinction may not be as clear to a human
|
||
reader)
|
||
|
||
|
||
Irrefutable case blocks
|
||
^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
The definition of irrefutable case blocks changes slightly in this PEP relative
|
||
to PEP 634, as capture patterns no longer exist as a separate concept from
|
||
AS patterns.
|
||
|
||
Aside from that caveat, the handling of irrefutable cases is the same as in
|
||
PEP 634:
|
||
|
||
* wildcard patterns are irrefutable
|
||
* AS patterns whose left-hand side is irrefutable
|
||
* OR patterns containing at least one irrefutable pattern
|
||
* parenthesized irrefutable patterns
|
||
* a case block is considered irrefutable if it has no guard and its
|
||
pattern is irrefutable.
|
||
* a match statement may have at most one irrefutable case block, and it
|
||
must be last.
|
||
|
||
|
||
.. _patterns:
|
||
|
||
Patterns
|
||
--------
|
||
|
||
The top-level surface syntax for patterns is as follows::
|
||
|
||
open_pattern: # Pattern may use multiple tokens with no closing delimiter
|
||
| as_pattern
|
||
| or_pattern
|
||
|
||
as_pattern: [closed_pattern] pattern_as_clause
|
||
|
||
or_pattern: '|'.simple_pattern+
|
||
|
||
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
|
||
| closed_pattern
|
||
| value_constraint
|
||
|
||
closed_pattern: # Require a single token or a closing delimiter in pattern
|
||
| wildcard_pattern
|
||
| group_pattern
|
||
| structural_constraint
|
||
|
||
As described above, the usage of open patterns is limited to top level case
|
||
clauses and when parenthesised in a group pattern.
|
||
|
||
The abstract syntax for patterns explicitly indicates which elements are
|
||
subpatterns and which elements are subexpressions or identifiers::
|
||
|
||
pattern = MatchAlways
|
||
| MatchValue(matchop op, expr value)
|
||
| MatchSequence(pattern* patterns)
|
||
| MatchMapping(expr* keys, pattern* patterns)
|
||
| MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
|
||
| MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
|
||
|
||
| MatchRestOfSequence(identifier? target)
|
||
-- A NULL entry in the MatchMapping key list handles capturing extra mapping keys
|
||
|
||
| MatchAs(pattern? pattern, identifier target)
|
||
| MatchOr(pattern* patterns)
|
||
|
||
|
||
AS Patterns
|
||
^^^^^^^^^^^
|
||
|
||
Surface syntax::
|
||
|
||
as_pattern: [closed_pattern] pattern_as_clause
|
||
pattern_as_clause: 'as' pattern_capture_target
|
||
pattern_capture_target: !"__" NAME !('.' | '(' | '=')
|
||
|
||
(Note: the name on the right may not be ``__``.)
|
||
|
||
Abstract syntax::
|
||
|
||
MatchAs(pattern? pattern, identifier target)
|
||
|
||
An AS pattern matches the closed pattern on the left of the ``as``
|
||
keyword against the subject. If this fails, the AS pattern fails.
|
||
Otherwise, the AS pattern binds the subject to the name on the right
|
||
of the ``as`` keyword and succeeds.
|
||
|
||
If no pattern to match is given, the wildcard pattern (``__``) is implied.
|
||
|
||
To avoid confusion with the `wildcard pattern`_, the double underscore (``__``)
|
||
is not permitted as a capture target (this is what ``!"__"`` expresses).
|
||
|
||
A capture pattern always succeeds. It binds the subject value to the
|
||
name using the scoping rules for name binding established for named expressions
|
||
in PEP 572. (Summary: the name becomes a local
|
||
variable in the closest containing function scope unless there's an
|
||
applicable ``nonlocal`` or ``global`` statement.)
|
||
|
||
In a given pattern, a given name may be bound only once. This
|
||
disallows for example ``case [as x, as x]: ...`` but allows
|
||
``case [as x] | (as x)``:
|
||
|
||
As an open pattern, the usage of AS patterns is limited to top level case
|
||
clauses and when parenthesised in a group pattern. However, several of the
|
||
structural constraints allow the use of ``pattern_as_clause`` in relevant
|
||
locations to bind extracted elements of the matched subject to local variables.
|
||
These are mostly represented in the abstract syntax tree as ``MatchAs`` nodes,
|
||
aside from the dedicated ``MatchRestOfSequence`` node in sequence patterns.
|
||
|
||
|
||
OR Patterns
|
||
^^^^^^^^^^^
|
||
|
||
Surface syntax::
|
||
|
||
or_pattern: '|'.simple_pattern+
|
||
|
||
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
|
||
| closed_pattern
|
||
| value_constraint
|
||
|
||
Abstract syntax::
|
||
|
||
MatchOr(pattern* patterns)
|
||
|
||
When two or more patterns are separated by vertical bars (``|``),
|
||
this is called an OR pattern. (A single simple pattern is just that)
|
||
|
||
Only the final subpattern may be irrefutable.
|
||
|
||
Each subpattern must bind the same set of names.
|
||
|
||
An OR pattern matches each of its subpatterns in turn to the subject,
|
||
until one succeeds. The OR pattern is then deemed to succeed.
|
||
If none of the subpatterns succeed the OR pattern fails.
|
||
|
||
Subpatterns are mostly required to be closed patterns, but the parentheses may
|
||
be omitted for value constraints.
|
||
|
||
|
||
.. _value_constraints:
|
||
|
||
Value constraints
|
||
^^^^^^^^^^^^^^^^^
|
||
|
||
Surface syntax::
|
||
|
||
value_constraint:
|
||
| eq_constraint
|
||
| id_constraint
|
||
|
||
eq_constraint: '==' closed_expr
|
||
id_constraint: 'is' closed_expr
|
||
|
||
closed_expr: # Require a single token or a closing delimiter in expression
|
||
| primary
|
||
| closed_factor
|
||
|
||
closed_factor: # "factor" is the main grammar node for these unary ops
|
||
| '+' primary
|
||
| '-' primary
|
||
| '~' primary
|
||
|
||
Abstract syntax::
|
||
|
||
MatchValue(matchop op, expr value)
|
||
matchop = EqCheck | IdCheck
|
||
|
||
|
||
The rule ``primary`` is defined in the standard Python grammar, and only
|
||
allows expressions that either consist of a single token, or else are required
|
||
to end with a closing delimiter.
|
||
|
||
Value constraints replace PEP 634's literal patterns and value patterns.
|
||
|
||
Equality constraints are written as ``== EXPR``, while identity constraints are
|
||
written as ``is EXPR``.
|
||
|
||
An equality constraint succeeds if the subject value compares equal to the
|
||
value given on the right, while an identity constraint succeeds only if they are
|
||
the exact same object.
|
||
|
||
The expressions to be compared against are largely restricted to either
|
||
single tokens (e.g. names, strings, numbers, builtin constants), or else to
|
||
expressions that are required to end with a closing delimiter.
|
||
|
||
The use of the high precedence unary operators is also permitted, as the risk of
|
||
perceived ambiguity is low, and being able to specify negative numbers without
|
||
parentheses is desirable.
|
||
|
||
When the same constraint expression occurs multiple times in the same match
|
||
statement, the interpreter may cache the first value calculated and reuse it,
|
||
rather than repeat the expression evaluation. (As for PEP 634 value patterns,
|
||
this cache is strictly tied to a given execution of a given match statement.)
|
||
|
||
Unlike literal patterns in PEP 634, this PEP requires that complex
|
||
literals be parenthesised to be accepted by the parser. See the Deferred
|
||
Ideas section for discussion on that point.
|
||
|
||
If this PEP were to be adopted in preference to PEP 634, then all literal and
|
||
value patterns would instead be written more explicitly as value constraints::
|
||
|
||
# Literal patterns
|
||
match number:
|
||
case == 0:
|
||
print("Nothing")
|
||
case == 1:
|
||
print("Just one")
|
||
case == 2:
|
||
print("A couple")
|
||
case == -1:
|
||
print("One less than nothing")
|
||
case == (1-1j):
|
||
print("Good luck with that...")
|
||
|
||
# Additional literal patterns
|
||
match value:
|
||
case == True:
|
||
print("True or 1")
|
||
case == False:
|
||
print("False or 0")
|
||
case == None:
|
||
print("None")
|
||
case == "Hello":
|
||
print("Text 'Hello'")
|
||
case == b"World!":
|
||
print("Binary 'World!'")
|
||
|
||
# Matching by identity rather than equality
|
||
SENTINEL = object()
|
||
match value:
|
||
case is True:
|
||
print("True, not 1")
|
||
case is False:
|
||
print("False, not 0")
|
||
case is None:
|
||
print("None, following PEP 8 comparison guidelines")
|
||
case is ...:
|
||
print("May be useful when writing __getitem__ methods?")
|
||
case is SENTINEL:
|
||
print("Matches the sentinel by identity, not just value")
|
||
|
||
# Matching against variables and attributes
|
||
from enum import Enum
|
||
class Sides(str, Enum):
|
||
SPAM = "Spam"
|
||
EGGS = "eggs"
|
||
...
|
||
|
||
preferred_side = Sides.EGGS
|
||
match entree[-1]:
|
||
case == Sides.SPAM: # Compares entree[-1] == Sides.SPAM.
|
||
response = "Have you got anything without Spam?"
|
||
case == preferred_side: # Compares entree[-1] == preferred_side
|
||
response = f"Oh, I love {preferred_side}!"
|
||
case as side: # Assigns side = entree[-1].
|
||
response = f"Well, could I have their Spam instead of the {side} then?"
|
||
|
||
Note the ``== preferred_side`` example: using an explicit prefix marker on
|
||
constraint expressions removes the restriction to only working with attributes
|
||
or literals for value lookups.
|
||
|
||
The ``== (1-1j)`` example illustrates the use of parentheses to turn any
|
||
subexpression into a closed one.
|
||
|
||
|
||
.. _wildcard_pattern:
|
||
|
||
Wildcard Pattern
|
||
^^^^^^^^^^^^^^^^
|
||
|
||
Surface syntax::
|
||
|
||
wildcard_pattern: "__"
|
||
|
||
Abstract syntax::
|
||
|
||
MatchAlways
|
||
|
||
A wildcard pattern always succeeds. As in PEP 634, it binds no name.
|
||
|
||
Where PEP 634 chooses the single underscore as its wildcard pattern for
|
||
consistency with other languages, this PEP chooses the double underscore as that
|
||
has a clearer path towards potentially being made consistent across the entire
|
||
language, whereas that path is blocked for ``"_"`` by i18n related use cases.
|
||
|
||
Example usage::
|
||
|
||
match sequence:
|
||
case [__]: # any sequence with a single element
|
||
return True
|
||
case [start, *__, end]: # a sequence with at least two elements
|
||
return start == end
|
||
case __: # anything
|
||
return False
|
||
|
||
|
||
|
||
Group Patterns
|
||
^^^^^^^^^^^^^^
|
||
|
||
Surface syntax::
|
||
|
||
group_pattern: '(' open_pattern ')'
|
||
|
||
For the syntax of ``open_pattern``, see Patterns above.
|
||
|
||
A parenthesized pattern has no additional syntax and is not represented in the
|
||
abstract syntax tree. It allows users to add parentheses around patterns to
|
||
emphasize the intended grouping, and to allow nesting of open patterns when the
|
||
grammar requires a closed pattern.
|
||
|
||
Unlike PEP 634, there is no potential ambiguity with sequence patterns, as
|
||
this PEP requires that all sequence patterns be written with square brackets.
|
||
|
||
|
||
Structural constraints
|
||
^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
Surface syntax::
|
||
|
||
structural_constraint:
|
||
| sequence_constraint
|
||
| mapping_constraint
|
||
| attrs_constraint
|
||
| class_constraint
|
||
|
||
Note: the separate "structural constraint" subcategory isn't used in the
|
||
abstract syntax tree, it's merely used as a convenient grouping node in the
|
||
surface syntax definition.
|
||
|
||
Structural constraints are patterns used to both make assertions about complex
|
||
objects and to extract values from them.
|
||
|
||
These patterns may all bind multiple values, either through the use of nested
|
||
AS patterns, or else through the use of ``pattern_as_clause`` elements included
|
||
in the definition of the pattern.
|
||
|
||
|
||
Sequence constraints
|
||
^^^^^^^^^^^^^^^^^^^^
|
||
|
||
Surface syntax::
|
||
|
||
sequence_constraint: '[' [sequence_constraint_elements] ']'
|
||
sequence_constraint_elements: ','.sequence_constraint_element+ ','?
|
||
sequence_constraint_element:
|
||
| star_pattern
|
||
| simple_pattern
|
||
| pattern_as_clause
|
||
star_pattern: '*' (pattern_as_clause | wildcard_pattern)
|
||
|
||
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
|
||
| closed_pattern
|
||
| value_constraint
|
||
|
||
pattern_as_clause: 'as' pattern_capture_target
|
||
|
||
Abstract syntax::
|
||
|
||
MatchSequence(pattern* patterns)
|
||
|
||
MatchRestOfSequence(identifier? target)
|
||
|
||
Sequence constraints allow items within a sequence to be checked and
|
||
optionally extracted.
|
||
|
||
A sequence pattern fails if the subject value is not an instance of
|
||
``collections.abc.Sequence``. It also fails if the subject value is
|
||
an instance of ``str``, ``bytes`` or ``bytearray`` (see Deferred Ideas for
|
||
a discussion on potentially removing the need for this special casing).
|
||
|
||
A sequence pattern may contain at most one star subpattern. The star
|
||
subpattern may occur in any position and is represented in the AST using the
|
||
``MatchRestOfSequence`` node.
|
||
|
||
If no star subpattern is present, the sequence pattern is a fixed-length
|
||
sequence pattern; otherwise it is a variable-length sequence pattern.
|
||
|
||
A fixed-length sequence pattern fails if the length of the subject
|
||
sequence is not equal to the number of subpatterns.
|
||
|
||
A variable-length sequence pattern fails if the length of the subject
|
||
sequence is less than the number of non-star subpatterns.
|
||
|
||
The length of the subject sequence is obtained using the builtin
|
||
``len()`` function (i.e., via the ``__len__`` protocol). However, the
|
||
interpreter may cache this value in a similar manner as described for
|
||
value constraint expressions.
|
||
|
||
A fixed-length sequence pattern matches the subpatterns to
|
||
corresponding items of the subject sequence, from left to right.
|
||
Matching stops (with a failure) as soon as a subpattern fails. If all
|
||
subpatterns succeed in matching their corresponding item, the sequence
|
||
pattern succeeds.
|
||
|
||
A variable-length sequence pattern first matches the leading non-star
|
||
subpatterns to the corresponding items of the subject sequence, as for
|
||
a fixed-length sequence. If this succeeds, the star subpattern
|
||
matches a list formed of the remaining subject items, with items
|
||
removed from the end corresponding to the non-star subpatterns
|
||
following the star subpattern. The remaining non-star subpatterns are
|
||
then matched to the corresponding subject items, as for a fixed-length
|
||
sequence.
|
||
|
||
Subpatterns are mostly required to be closed patterns, but the parentheses may
|
||
be omitted for value constraints. Sequence elements may also be captured
|
||
unconditionally without parentheses.
|
||
|
||
Note: where PEP 634 allows all the same syntactic flexibility as iterable
|
||
unpacking in assignment statements, this PEP restricts sequence patterns
|
||
specifically to the square bracket form. Given that the open and parenthesised
|
||
forms are far more popular than square brackets for iterable unpacking, this
|
||
helps emphasise that iterable unpacking and sequence matching are *not* the
|
||
same operation. It also avoids the parenthesised form's ambiguity problem
|
||
between single element sequence patterns and group patterns.
|
||
|
||
|
||
Mapping constraints
|
||
^^^^^^^^^^^^^^^^^^^
|
||
|
||
Surface syntax::
|
||
|
||
mapping_constraint: '{' [mapping_constraint_elements] '}'
|
||
mapping_constraint_elements: ','.key_value_constraint+ ','?
|
||
key_value_constraint:
|
||
| closed_expr pattern_as_clause
|
||
| closed_expr ':' simple_pattern
|
||
| double_star_capture
|
||
double_star_capture: '**' pattern_as_clause
|
||
|
||
(Note that ``**__`` is deliberately disallowed by this syntax, as additional
|
||
mapping entries are ignored by default)
|
||
|
||
closed_expr is defined above, under value constraints.
|
||
|
||
Abstract syntax::
|
||
|
||
MatchMapping(expr* keys, pattern* patterns)
|
||
|
||
Mapping constraints allow keys and values within a sequence to be checked and
|
||
values to optionally be extracted.
|
||
|
||
A mapping pattern fails if the subject value is not an instance of
|
||
``collections.abc.Mapping``.
|
||
|
||
A mapping pattern succeeds if every key given in the mapping pattern
|
||
is present in the subject mapping, and the pattern for
|
||
each key matches the corresponding item of the subject mapping.
|
||
|
||
The presence of keys is checked using the two argument form of the ``get``
|
||
method and a unique sentinel value, which offers the following benefits:
|
||
|
||
* no exceptions need to be created in the lookup process
|
||
* mappings that implement ``__missing__`` (such as ``collections.defaultdict``)
|
||
only match on keys that they already contain, they don't implicitly add keys
|
||
|
||
A mapping pattern may not contain duplicate key values. If duplicate keys are
|
||
detected when checking the mapping pattern, the pattern is considered invalid,
|
||
and a ``ValueError`` is raised. While it would theoretically be possible to
|
||
checked for duplicated constant keys at compile time, no such check is currently
|
||
defined or implemented.
|
||
|
||
(Note: This semantic description is derived from the PEP 634 reference
|
||
implementation, which differs from the PEP 634 specification text at time of
|
||
writing. The implementation seems reasonable, so amending the PEP text seems
|
||
like the best way to resolve the discrepancy)
|
||
|
||
If a ``'**' as NAME`` double star pattern is present, that name is bound to a
|
||
``dict`` containing any remaining key-value pairs from the subject mapping
|
||
(the dict will be empty if there are no additional key-value pairs).
|
||
|
||
A mapping pattern may contain at most one double star pattern,
|
||
and it must be last.
|
||
|
||
Value subpatterns are mostly required to be closed patterns, but the parentheses
|
||
may be omitted for value constraints (the ``:`` key/value separator is still
|
||
required to ensure the entry doesn't look like an ordinary comparison operation).
|
||
|
||
Mapping values may also be captured unconditionally using the ``KEY as NAME``
|
||
form, without either parentheses or the ``:`` key/value separator.
|
||
|
||
|
||
Instance attribute constraints
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
Surface syntax::
|
||
|
||
attrs_constraint:
|
||
| name_or_attr '{' [attrs_constraint_elements] '}'
|
||
attrs_constraint_elements: ','.attr_value_pattern+ ','?
|
||
attr_value_pattern:
|
||
| '.' NAME pattern_as_clause
|
||
| '.' NAME value_constraint
|
||
| '.' NAME ':' simple_pattern
|
||
| '.' NAME
|
||
|
||
Abstract syntax::
|
||
|
||
MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
|
||
|
||
Instance attribute constraints allow an instance's type to be checked and
|
||
attributes to optionally be extracted.
|
||
|
||
An instance attribute constraint may not repeat the same attribute name multiple
|
||
times. Attempting to do so will result in a syntax error.
|
||
|
||
An instance attribute pattern fails if the subject is not an instance of
|
||
``name_or_attr``. This is tested using ``isinstance()``.
|
||
|
||
If ``name_or_attr`` is not an instance of the builtin ``type``,
|
||
``TypeError`` is raised.
|
||
|
||
If no attribute subpatterns are present, the constraint succeeds if the
|
||
``isinstance()`` check succeeds. Otherwise:
|
||
|
||
- Each given attribute name is looked up as an attribute on the subject.
|
||
|
||
- If this raises an exception other than ``AttributeError``,
|
||
the exception bubbles up.
|
||
|
||
- If this raises ``AttributeError`` the constraint fails.
|
||
|
||
- Otherwise, the subpattern associated with the keyword is matched
|
||
against the attribute value. If no subpattern is specified, the wildcard
|
||
pattern is assumed. If this fails, the constraint fails.
|
||
If it succeeds, the match proceeds to the next attribute.
|
||
|
||
- If all attribute subpatterns succeed, the constraint as a whole succeeds.
|
||
|
||
Instance attribute constraints allow ducktyping checks to be implemented by
|
||
using ``object`` as the required instance type (e.g.
|
||
``case object{.host as host, .port as port}:``).
|
||
|
||
The syntax being proposed here could potentially also be used as the basis for
|
||
a new syntax for retrieving multiple attributes from an object instance in one
|
||
assignment statement (e.g. ``host, port = addr{.host, .port}``). See the
|
||
Deferred Ideas section for further discussion of this point.
|
||
|
||
|
||
Class defined constraints
|
||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
Surface syntax::
|
||
|
||
class_constraint:
|
||
| name_or_attr '(' ')'
|
||
| name_or_attr '(' positional_patterns ','? ')'
|
||
| name_or_attr '(' class_constraint_attrs ')'
|
||
| name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')'
|
||
positional_patterns: ','.positional_pattern+
|
||
positional_pattern:
|
||
| simple_pattern
|
||
| pattern_as_clause
|
||
class_constraint_attrs:
|
||
| '**' '{' [attrs_constraint_elements] '}'
|
||
|
||
Abstract syntax::
|
||
|
||
MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
|
||
|
||
Class defined constraints allow a sequence of common attributes to be
|
||
specified on a class and checked positionally, rather than needing to specify
|
||
the attribute names in every related match pattern.
|
||
|
||
As for instance attribute patterns:
|
||
|
||
- a class defined pattern fails if the subject is not an instance of
|
||
``name_or_attr``. This is tested using ``isinstance()``.
|
||
- if ``name_or_attr`` is not an instance of the builtin ``type``,
|
||
``TypeError`` is raised.
|
||
|
||
Regardless of whether or not any arguments are present, the subject is checked
|
||
for a ``__match_args__`` attribute using the equivalent of
|
||
``getattr(cls, "__match_args__", _SENTINEL))``.
|
||
|
||
If this raises an exception the exception bubbles up.
|
||
|
||
If the returned value is not a list, tuple, or ``None``, the conversion fails
|
||
and ``TypeError`` is raised at runtime.
|
||
|
||
This means that only types that actually define ``__match_args__`` will be
|
||
usable in class defined patterns. Types that don't define ``__match_args__``
|
||
will still be usable in instance attribute patterns.
|
||
|
||
If ``__match_args__`` is ``None``, then only a single positional subpattern is
|
||
permitted. Attempting to specify additional attribute patterns either
|
||
positionally or using the double star syntax will cause ``TypeError`` to be
|
||
raised at runtime.
|
||
|
||
This positional subpattern is then matched against the entire subject, allowing
|
||
a type check to be combined with another match pattern (e.g. checking both
|
||
the type and contents of a container, or the type and value of a number).
|
||
|
||
If ``__match_args__`` is a list or tuple, then the class defined constraint is
|
||
converted to an instance attributes constraint as follows:
|
||
|
||
- if only the double star attribute constraints subpattern is present, matching
|
||
proceeds as if for the equivalent instance attributes constraint.
|
||
- if there are more positional subpatterns than the length of
|
||
``__match_args__``` (as obtained using ``len()``), ``TypeError`` is raised.
|
||
- Otherwise, positional pattern ``i`` is converted to an attribute pattern
|
||
using ``__match_args__[i]`` as the attribute name.
|
||
- if any element in ``__match_args__`` is not a string, ``TypeError`` is raised.
|
||
- once the positional patterns have been converted to attribute patterns, then
|
||
they are combined with any atribute constraints given in the double star
|
||
attribute constraints subpattern, and matching proceeds as if for the
|
||
equivalent instance attributes constraint.
|
||
|
||
Note: the ``__match_args__ is None`` handling in this PEP replaces the special
|
||
casing of ``bool``, ``bytearray``, ``bytes``, ``dict``, ``float``,
|
||
``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple`` in PEP 634.
|
||
However, the optimised fast path for those types is retained in the
|
||
implementation.
|
||
|
||
|
||
Design Discussion
|
||
=================
|
||
|
||
Requiring explicit qualification of simple names in match patterns
|
||
------------------------------------------------------------------
|
||
|
||
The first iteration of this PEP accepted the basic premise of PEP 634 that
|
||
iterable unpacking syntax would provide a good foundation for defining a new
|
||
syntax for pattern matching.
|
||
|
||
During the review process, however, two major and one minor ambiguity problems
|
||
were highlighted that arise directly from that core assumption:
|
||
|
||
* most problematically, when binding simple names by default is extended to
|
||
PEP 634's proposed class pattern syntax, the ``ATTR=TARGET_NAME`` construct
|
||
binds to the right without using the ``as`` keyword, and uses the normal
|
||
assignment-to-the-left sigil (``=``) to do it!
|
||
* when binding simple names by default is extended to PEP 634's proposed mapping
|
||
pattern syntax, the ``KEY: TARGET_NAME`` construct binds to the right without
|
||
using the ``as`` keyword
|
||
* using a PEP 634 capture pattern together with an AS pattern
|
||
(``TARGET_NAME_1 as TARGET_NAME_2``) gives an odd "binds to both the left and
|
||
right" behaviour
|
||
|
||
The third revision of this PEP accounted for this problem by abandoning the
|
||
alignment with iterable unpacking syntax, and instead requiring that all uses
|
||
of bare simple names for anything other than a variable lookup be qualified by
|
||
a preceding sigil or keyword:
|
||
|
||
* ``as NAME``: local variable binding
|
||
* ``.NAME``: attribute lookup
|
||
* ``== NAME``: variable lookup
|
||
* ``is NAME``: variable lookup
|
||
* any other usage: variable lookup
|
||
|
||
The key benefit of this approach is that it makes interpretation of simple names
|
||
in patterns a local activity: a leading ``as`` indicates a name binding, a
|
||
leading ``.`` indicates an attribute lookup, and anything else is a variable
|
||
lookup (regardless of whether we're reading a subpattern or a subexpression).
|
||
|
||
With the syntax now proposed in this PEP, the problematic cases identified above
|
||
no longer read poorly:
|
||
|
||
* ``.ATTR as TARGET_NAME`` is more obviously a binding than ``ATTR=TARGET_NAME``
|
||
* ``KEY as TARGET_NAME`` is more obviously a binding than ``KEY: TARGET_NAME``
|
||
* ``(as TARGET_NAME_1) as TARGET_NAME_2`` is more obviously two bindings than
|
||
``TARGET_NAME_1 as TARGET_NAME_2``
|
||
|
||
|
||
Resisting the temptation to guess
|
||
---------------------------------
|
||
|
||
PEP 635 looks at the way pattern matching is used in other languages, and
|
||
attempts to use that information to make plausible predictions about the way
|
||
pattern matching will be used in Python:
|
||
|
||
* wanting to extract values to local names will *probably* be more common than
|
||
wanting to match against values stored in local names
|
||
* wanting comparison by equality will *probably* be more common than wanting
|
||
comparison by identity
|
||
* users will *probably* be able to at least remember that bare names bind values
|
||
and attribute references look up values, even if they can't figure that out
|
||
for themselves without reading the documentation or having someone tell them
|
||
|
||
To be clear, I think these predictions actually *are* plausible. However, I also
|
||
don't think we need to guess about this up front: I think we can start out with
|
||
a more explicit syntax that requires users to state their intent using a prefix
|
||
marker (either ``as``, ``==``, or ``is``), and then reassess the situation in a
|
||
few years based on how pattern matching is actually being used *in Python*.
|
||
|
||
At that point, we'll be able to choose amongst at least the following options:
|
||
|
||
* deciding the explicit syntax is concise enough, and not changing anything
|
||
* adding inferred identity constraints for one or more of ``None``, ``...``,
|
||
``True`` and ``False``
|
||
* adding inferred equality constraints for other literals (potentially including
|
||
complex literals)
|
||
* adding inferred equality constraints for attribute lookups
|
||
* adding either inferred equality constraints or inferred capture patterns for
|
||
bare names
|
||
|
||
All of those ideas could be considered independently on their own merits, rather
|
||
than being a potential barrier to introducing pattern matching in the first
|
||
place.
|
||
|
||
If any of these syntactic shortcuts were to eventually be introduced, they'd
|
||
also be straightforward to explain in terms of the underlying more explicit
|
||
syntax (the leading ``as``, ``==``, or ``is`` would just be getting inferred
|
||
by the parser, without the user needing to provide it explicitly). At the
|
||
implementation level, only the parser should need to be change, as the existing
|
||
AST nodes could be reused.
|
||
|
||
|
||
Interaction with caching of attribute lookups in local variables
|
||
----------------------------------------------------------------
|
||
|
||
One of the major changes between this PEP and PEP 634 is to use ``== EXPR``
|
||
for equality constraint lookups, rather than only offering ``NAME.ATTR``. The
|
||
original motivation for this was to avoid the semantic conflict with regular
|
||
assignment targets, where ``NAME.ATTR`` is already used in assignment statements
|
||
to set attributes, so if ``NAME.ATTR`` were the *only* syntax for symbolic value
|
||
matching, then we're pre-emptively ruling out any future attempts to allow
|
||
matching against single patterns using the existing assignment statement syntax.
|
||
The current motivation is more about the general desire to avoid guessing about
|
||
user's intent, and instead requiring them to state it explicitly in the syntax.
|
||
|
||
However, even within match statements themselves, the ``name.attr`` syntax for
|
||
value patterns has an undesirable interaction with local variable assignment,
|
||
where routine refactorings that would be semantically neutral for any other
|
||
Python statement introduce a major semantic change when applied to a PEP 634
|
||
style match statement.
|
||
|
||
Consider the following code::
|
||
|
||
while value < self.limit:
|
||
... # Some code that adjusts "value"
|
||
|
||
The attribute lookup can be safely lifted out of the loop and only performed
|
||
once::
|
||
|
||
_limit = self.limit:
|
||
while value < _limit:
|
||
... # Some code that adjusts "value"
|
||
|
||
With the marker prefix based syntax proposal in this PEP, value constraints
|
||
would be similarly tolerant of match patterns being refactored to use a local
|
||
variable instead of an attribute lookup, with the following two statements
|
||
being functionally equivalent::
|
||
|
||
match expr:
|
||
case {"key": == self.target}:
|
||
... # Handle the case where 'expr["key"] == self.target'
|
||
case __:
|
||
... # Handle the non-matching case
|
||
|
||
_target = self.target
|
||
match expr:
|
||
case {"key": == _target}:
|
||
... # Handle the case where 'expr["key"] == self.target'
|
||
case __:
|
||
... # Handle the non-matching case
|
||
|
||
By contrast, when using PEP 634's value and capture pattern syntaxes that omit
|
||
the marker prefix, the following two statements wouldn't be equivalent at all::
|
||
|
||
# PEP 634's value pattern syntax
|
||
match expr:
|
||
case {"key": self.target}:
|
||
... # Handle the case where 'expr["key"] == self.target'
|
||
case _:
|
||
... # Handle the non-matching case
|
||
|
||
# PEP 634's capture pattern syntax
|
||
_target = self.target
|
||
match expr:
|
||
case {"key": _target}:
|
||
... # Matches any mapping with "key", binding its value to _target
|
||
case _:
|
||
... # Handle the non-matching case
|
||
|
||
This PEP ensures the original semantics are retained under this style of
|
||
simplistic refactoring: use ``== name`` to force interpretation of the result
|
||
as a value constraint, use ``as name`` for a name binding.
|
||
|
||
PEP 634's proposal to offer only the shorthand syntax, with no explicitly
|
||
prefixed form, means that the primary answer on offer is "Well, don't do that,
|
||
then, only compare against attributes in namespaces, don't compare against
|
||
simple names".
|
||
|
||
PEP 622's walrus pattern syntax had another odd interaction where it might not
|
||
bind the same object as the exact same walrus expression in the body of the
|
||
case clause, but PEP 634 fixed that discrepancy by replacing walrus patterns
|
||
with AS patterns (where the fact that the value bound to the name on the RHS
|
||
might not be the same value as returned by the LHS is a standard feature common
|
||
to all uses of the "as" keyword).
|
||
|
||
|
||
Using existing comparison operators as the value constraint prefix
|
||
--------------------------------------------------------------------
|
||
|
||
If the benefit of a dedicated value constraint prefix is accepted, then the
|
||
next question is to ask exactly what that prefix should be.
|
||
|
||
The initially published version of this PEP proposed using the previously
|
||
unused ``?`` symbol as the prefix for equality constraints, and ``?is`` as the
|
||
prefix for identity constraints. When reviewing the PEP, Steven D'Aprano
|
||
presented a compelling counterproposal [5_] to use the existing comparison
|
||
operators (``==`` and ``is``) instead.
|
||
|
||
There were a few concerns with ``==`` as a prefix that kept it from being
|
||
chosen as the prefix in the initial iteration of the PEP:
|
||
|
||
* for common use cases, it's even more visually noisy than ``?``, as a lot of
|
||
folks with PEP 8 trained aesthetic sensibilities are going to want to put
|
||
a space between it and the following expression, effectively making it a 3
|
||
character prefix instead of 1
|
||
* when used in a mapping pattern, there needs to be a space between the ``:``
|
||
key/value separator and the ``==`` prefix, or the tokeniser will split them
|
||
up incorrectly (getting ``:=`` and ``=`` instead of ``:`` and ``==``)
|
||
* when used in an OR pattern, there needs to be a space between the ``|``
|
||
pattern separator and the ``==`` prefix, or the tokeniser will split them
|
||
up incorrectly (getting ``|=`` and ``=`` instead of ``|`` and ``==``)
|
||
* if used in a PEP 634 style class pattern, there needs to be a space between
|
||
the ``=`` keyword separator and the ``==`` prefix, or the tokeniser will split
|
||
them up incorrectly (getting ``==`` and ``=`` instead of ``=`` and ``==``)
|
||
|
||
Rather than introducing a completely new symbol, Steven's proposed resolution to
|
||
this verbosity problem was to retain the ability to omit the prefix marker in
|
||
syntactically unambiguous cases.
|
||
|
||
While the idea of omitting the prefix marker was accepted for the second
|
||
revision of the proposal, it was dropped again in the third revision due to
|
||
ambiguity concerns. Instead, the following points apply:
|
||
|
||
* for class patterns, other syntax changes allow equality constraints to be
|
||
written as ``.ATTR == EXPR``, and identity constraints to be written as
|
||
``.ATTR is EXPR``, both of which are quite easy to read
|
||
* for mapping patterns, the extra syntactic noise is just tolerated (at least
|
||
for now)
|
||
* for OR patterns, the extra syntactic noise is just tolerated (at least
|
||
for now). However, `membership constraints`_ may offer a future path to
|
||
reducing the need to combine OR patterns with equality constraints (instead,
|
||
the values to be checked against would be collected as a set, list, or tuple).
|
||
|
||
Given that perspective, PEP 635's arguments against using ``?`` as part of the
|
||
pattern matching syntax held for this proposal as well, and so the PEP was
|
||
amended accordingly.
|
||
|
||
|
||
Using ``__`` as the wildcard pattern marker
|
||
-------------------------------------------
|
||
|
||
PEP 635 makes a solid case that introducing ``?`` *solely* as a wildcard pattern
|
||
marker would be a bad idea. With the syntax for value constraints changed
|
||
to use existing comparison operations rather than ``?`` and ``?is``, that
|
||
argument holds for this PEP as well.
|
||
|
||
However, as noted by Thomas Wouters in [6_], PEP 634's choice of ``_`` remains
|
||
problematic as it would likely mean that match patterns would have a *permanent*
|
||
difference from all other parts of Python - the use of ``_`` in software
|
||
internationalisation and at the interactive prompt means that there isn't really
|
||
a plausible path towards using it as a general purpose "skipped binding" marker.
|
||
|
||
``__`` is an alternative "this value is not needed" marker drawn from a Stack
|
||
Overflow answer [7_] (originally posted by the author of this PEP) on the
|
||
various meanings of ``_`` in existing Python code.
|
||
|
||
This PEP also proposes adopting an implementation technique that limits
|
||
the scope of the associated special casing of ``__`` to the parser: defining a
|
||
new AST node type (``MatchAlways``) specifically for wildcard markers, rather
|
||
than passing it through to the AST as a ``Name`` node.
|
||
|
||
Within the parser, ``__`` still means either a regular name or a wildcard
|
||
marker in a match pattern depending on where you were in the parse tree, but
|
||
within the rest of the compiler, ``Name("__")`` is still a normal variable name,
|
||
while ``MatchAlways()`` is always a wildcard marker in a match pattern.
|
||
|
||
Unlike ``_``, the lack of other use cases for ``__`` means that there would be
|
||
a plausible path towards restoring identifier handling consistency with the rest
|
||
of the language by making ``__`` mean "skip this name binding" everwhere in
|
||
Python:
|
||
|
||
* in the interpreter itself, deprecate loading variables with the name ``__``.
|
||
This would make reading from ``__`` emit a deprecation warning, while writing
|
||
to it would initially be unchanged. To avoid slowing down all name loads, this
|
||
could be handled by having the compiler emit additional code for the
|
||
deprecated name, rather than using a runtime check in the standard name
|
||
loading opcodes.
|
||
* after a suitable number of releases, change the parser to emit
|
||
a new ``SkippedBinding`` AST node for all uses of ``__`` as an assignment
|
||
target, and update the rest of the compiler accordingly
|
||
* consider making ``__`` a true hard keyword rather than a soft keyword
|
||
|
||
This deprecation path couldn't be followed for ``_``, as there's no way for the
|
||
interpreter to distinguish between attempts to read back ``_`` when nominally
|
||
used as a "don't care" marker, and legitimate reads of ``_`` as either an
|
||
i18n text translation function or as the last statement result at the
|
||
interactive prompt.
|
||
|
||
Names starting with double-underscores are also already reserved for use by the
|
||
language, whether that is for compile time constants (i.e. ``__debug__``),
|
||
special methods, or class attribute name mangling, so using ``__`` here would
|
||
be consistent with that existing approach.
|
||
|
||
|
||
Representing patterns explicitly in the Abstract Syntax Tree
|
||
------------------------------------------------------------
|
||
|
||
PEP 634 doesn't explicitly discuss how match statements should be represented
|
||
in the Abstract Syntax Tree, instead leaving that detail to be defined as part
|
||
of the implementation.
|
||
|
||
As a result, while the reference implementation of PEP 634 definitely works (and
|
||
formed the basis of the reference implementation of this PEP), it does contain
|
||
a significant design flaw: despite the notes in PEP 635 that patterns should be
|
||
considered as distinct from expressions, the reference implementation goes ahead
|
||
and represents them in the AST as expression nodes.
|
||
|
||
The result is an AST that isn't very abstract at all: nodes that should be
|
||
compiled completely differently (because they're patterns rather than
|
||
expressions) are represented the same way, and the type system of the
|
||
implementation language (e.g. C for CPython) can't offer any assistance in
|
||
keeping track of which subnodes should be ordinary expressions and which should
|
||
be subpatterns.
|
||
|
||
Rather than continuing with that approach, this PEP has instead defined a new
|
||
explicit "pattern" node in the AST, which allows the patterns and their
|
||
permitted subnodes to be defined explicitly in the AST itself, making the code
|
||
implementing the new feature clearer, and allowing the C compiler to provide
|
||
more assistance in keeping track of when the code generator is dealing with
|
||
patterns or expressions.
|
||
|
||
This change in implementation approach is actually orthogonal to the surface
|
||
syntax changes proposed in this PEP, so it could still be adopted even if the
|
||
rest of the PEP were to be rejected.
|
||
|
||
|
||
Changes to sequence patterns
|
||
----------------------------
|
||
|
||
This PEP makes one notable change to sequence patterns relative to PEP 634:
|
||
|
||
* only the square bracket form of sequence pattern is supported. Neither open
|
||
(no delimeters) nor tuple style (parentheses as delimiters) sequence patterns
|
||
are supported.
|
||
|
||
Relative to PEP 634, sequence patterns are also significantly affected by the
|
||
change to require explicit qualification of capture patterns and value
|
||
constraints, as it means ``case [a, b, c]:`` must instead be written as
|
||
``case [as a, as b, as c]:`` and ``case [0, 1]:`` must instead be written as
|
||
``case [== 0, == 1]:``.
|
||
|
||
With the syntax for sequence patterns no longer being derived directly from the
|
||
syntax for iterable unpacking, it no longer made sense to keep the syntactic
|
||
flexibility that had been included in the original syntax proposal purely for
|
||
consistency with iterable unpacking.
|
||
|
||
Allowing open and tuple style sequence patterns didn't increase expressivity,
|
||
only ambiguity of intent (especially relative to group paterns), and encouraged
|
||
readers down the path of viewing pattern matching syntax as intrinsically linked
|
||
to assignment target syntax (which the PEP 634 authors have stated multiple
|
||
times is not a desirable path to have readers take, and a view the author of
|
||
this PEP now shares, despite disagreeing with it originally).
|
||
|
||
|
||
Changes to mapping patterns
|
||
---------------------------
|
||
|
||
This PEP makes two notable changes to mapping patterns relative to PEP 634:
|
||
|
||
* value capturing is written as ``KEY as NAME`` rather than as ``KEY: NAME``
|
||
* a wider range of keys are permitted: any "closed expression", rather than
|
||
only literals and attribute references
|
||
|
||
As discussed above, the first change is part of ensuring that all binding
|
||
operations with the target name to the right of a subexpression or pattern
|
||
use the ``as`` keyword.
|
||
|
||
The second change is mostly a matter of simplifying the parser and code
|
||
generator code by reusing the existing expression handling machinery. The
|
||
restriction to closed expressions is designed to help reduce ambiguity as to
|
||
where the key expression ends and the match pattern begins. This mostly allows
|
||
a superset of what PEP 634 allows, except that complex literals must be written
|
||
in parentheses (at least for now).
|
||
|
||
Adapting PEP 635's mapping pattern examples to the syntax proposed in this PEP::
|
||
|
||
match json_pet:
|
||
case {"type": == "cat", "name" as name, "pattern" as pattern}:
|
||
return Cat(name, pattern)
|
||
case {"type": == "dog", "name" as name, "breed" as breed}:
|
||
return Dog(name, breed)
|
||
case __:
|
||
raise ValueError("Not a suitable pet")
|
||
|
||
def change_red_to_blue(json_obj):
|
||
match json_obj:
|
||
case { 'color': (== 'red' | == '#FF0000') }:
|
||
json_obj['color'] = 'blue'
|
||
case { 'children' as children }:
|
||
for child in children:
|
||
change_red_to_blue(child)
|
||
|
||
For reference, the equivalent PEP 634 syntax::
|
||
|
||
match json_pet:
|
||
case {"type": "cat", "name": name, "pattern": pattern}:
|
||
return Cat(name, pattern)
|
||
case {"type": "dog", "name": name, "breed": breed}:
|
||
return Dog(name, breed)
|
||
case _:
|
||
raise ValueError("Not a suitable pet")
|
||
|
||
def change_red_to_blue(json_obj):
|
||
match json_obj:
|
||
case { 'color': ('red' | '#FF0000') }:
|
||
json_obj['color'] = 'blue'
|
||
case { 'children': children }:
|
||
for child in children:
|
||
change_red_to_blue(child)
|
||
|
||
|
||
Changes to class patterns
|
||
-------------------------
|
||
|
||
This PEP makes several notable changes to class patterns relative to PEP 634:
|
||
|
||
* the syntactic alignment with class instantiation is abandoned as being
|
||
actively misleading and unhelpful. Instead, a new dedicated syntax for
|
||
checking additional attributes is introduced that draws inspiration from
|
||
mapping patterns rather than class instantiation
|
||
* a new dedicated syntax for simple ducktyping that will work for any class
|
||
is introduced
|
||
* the special casing of various builtin and standard library types is
|
||
supplemented by a general check for the existence of a ``__match_args__``
|
||
attribute with the value of ``None``
|
||
|
||
As discussed above, the first change has two purposes:
|
||
|
||
* it's part of ensuring that all binding operations with the target name to the
|
||
right of a subexpression or pattern use the ``as`` keyword. Using ``=`` to
|
||
assign to the right is particularly problematic.
|
||
* it's part of ensuring that all uses of simple names in patterns have a prefix
|
||
that indicates their purpose (in this case, a leading ``.`` to indicate an
|
||
attribute lookup)
|
||
|
||
The syntactic alignment with class instantion was also judged to be unhelpful
|
||
in general, as class patterns are about matching patterns against attributes,
|
||
while class instantiation is about matching call arguments to parameters in
|
||
class constructors, which may not bear much resemblance to the resulting
|
||
instance attributes at all.
|
||
|
||
The second change is intended to make it easier to use pattern matching for the
|
||
"ducktyping" style checks that are already common in Python.
|
||
|
||
The concrete syntax proposal for these patterns then arose from viewing
|
||
instances as mappings of attribute names to values, and combining the attribute
|
||
lookup syntax (``.ATTR``), with the mapping pattern syntax ``{KEY: PATTERN}``
|
||
to give ``cls{.ATTR: PATTERN}``.
|
||
|
||
Allowing ``cls{.ATTR}`` to mean the same thing as ``cls{.ATTR: __}`` was a
|
||
matter of considering the leading ``.`` sufficient to render the name usage
|
||
unambiguous (it's clearly an attribute reference, whereas matching against a variable
|
||
key in a mapping pattern would be arguably ambiguous)
|
||
|
||
The final change just supplements a CPython-internal-only check in the PEP 634
|
||
reference implementation by making it the default behaviour that classes get if
|
||
they don't define ``__match_args__`` (the optimised fast path for the builtin
|
||
and standard library types named in PEP 634 is retained).
|
||
|
||
Adapting the class matching example
|
||
`linked from PEP 635 <https://github.com/gvanrossum/patma/blob/master/examples/expr.py#L231>`_
|
||
shows that for purely positional class matching, the main impact comes from the
|
||
changes to value constraints and name binding, not from the class matching
|
||
changes::
|
||
|
||
match expr:
|
||
case BinaryOp(== '+', as left, as right):
|
||
return eval_expr(left) + eval_expr(right)
|
||
case BinaryOp(== '-', as left, as right):
|
||
return eval_expr(left) - eval_expr(right)
|
||
case BinaryOp(== '*', as left, as right):
|
||
return eval_expr(left) * eval_expr(right)
|
||
case BinaryOp(== '/', as left, as right):
|
||
return eval_expr(left) / eval_expr(right)
|
||
case UnaryOp(== '+', as arg):
|
||
return eval_expr(arg)
|
||
case UnaryOp(== '-', as arg):
|
||
return -eval_expr(arg)
|
||
case VarExpr(as name):
|
||
raise ValueError(f"Unknown value of: {name}")
|
||
case float() | int():
|
||
return expr
|
||
case __:
|
||
raise ValueError(f"Invalid expression value: {repr(expr)}")
|
||
|
||
For reference, the equivalent PEP 634 syntax::
|
||
|
||
match expr:
|
||
case BinaryOp('+', left, right):
|
||
return eval_expr(left) + eval_expr(right)
|
||
case BinaryOp('-', left, right):
|
||
return eval_expr(left) - eval_expr(right)
|
||
case BinaryOp('*', left, right):
|
||
return eval_expr(left) * eval_expr(right)
|
||
case BinaryOp('/', left, right):
|
||
return eval_expr(left) / eval_expr(right)
|
||
case UnaryOp('+', arg):
|
||
return eval_expr(arg)
|
||
case UnaryOp('-', arg):
|
||
return -eval_expr(arg)
|
||
case VarExpr(name):
|
||
raise ValueError(f"Unknown value of: {name}")
|
||
case float() | int():
|
||
return expr
|
||
case _:
|
||
raise ValueError(f"Invalid expression value: {repr(expr)}")
|
||
|
||
The changes to the class pattern syntax itself are more relevant when
|
||
checking for named attributes and extracting their values without relying on
|
||
``__match_args__``::
|
||
|
||
match expr:
|
||
case object{.host as host, .port as port}:
|
||
pass
|
||
case object{.host as host}:
|
||
pass
|
||
|
||
Compare this to the PEP 634 equivalent, where it really isn't clear which names
|
||
are referring to attributes of the match subject and which names are referring
|
||
to local variables::
|
||
|
||
match expr:
|
||
case object(host=host, port=port):
|
||
pass
|
||
case object(host=host):
|
||
pass
|
||
|
||
In this specific case, that ambiguity doesn't matter (since the attribute and
|
||
variable names are the same), but in the general case, knowing which is which
|
||
will be critical to reasoning correctly about the code being read.
|
||
|
||
|
||
Deferred Ideas
|
||
==============
|
||
|
||
Inferred value constraints
|
||
--------------------------
|
||
|
||
As discussed above, this PEP doesn't rule out the possibility of adding
|
||
inferred equality and identity constraints in the future.
|
||
|
||
These could be particularly valuable for literals, as it is quite likely that
|
||
many "magic" strings and numbers with self-evident meanings will be written
|
||
directly into match patterns, rather than being stored in named variables.
|
||
(Think constants like ``None``, or obviously special numbers like ``0`` and
|
||
``1``, or strings where their contents are as descriptive as any variable name,
|
||
rather than cryptic checks against opaque numbers like ``739452``)
|
||
|
||
|
||
Making some required parentheses optional
|
||
-----------------------------------------
|
||
|
||
The PEP currently errs heavily on the side of requiring parentheses in the face
|
||
of potential ambiguity.
|
||
|
||
However, there are a number of cases where it at least arguably goes too far,
|
||
mostly involving AS patterns with an explicit pattern.
|
||
|
||
In any position that requires a closed pattern, AS patterns may end up starting
|
||
with doubled parentheses, as the nested pattern is also required to be a closed
|
||
pattern: ``((OPEN PTRN) as NAME)``
|
||
|
||
Due to the requirement that the subpattern be closed, it should be reasonable
|
||
in many of these cases (e.g. sequence pattern subpatterns) to accept
|
||
``CLOSED_PTRN as NAME`` directly.
|
||
|
||
Further consideration of this point has been deferred, as making required
|
||
parentheses optional is a backwards compatible change, and hence relaxing the
|
||
restrictions later can be considered on a case by case basis.
|
||
|
||
|
||
Accepting complex literals as closed expressions
|
||
------------------------------------------------
|
||
|
||
PEP 634's reference implementation includes a lot of special casing of binary
|
||
operations in both the parser and the rest of the compiler in order to accept
|
||
complex literals without accepting arbitrary binary numeric operations on
|
||
literal values.
|
||
|
||
Ideally, this problem would be dealt with at the parser layer, with the parser
|
||
directly emitting a Constant AST node prepopulated with a complex number. If
|
||
that was the way things worked, then complex literals could be accepted through
|
||
a similar mechanism to any other literal.
|
||
|
||
This isn't how complex literals are handled, however. Instead, they're passed
|
||
through to the AST as regular ``BinOp`` nodes, and then the constant folding
|
||
pass on the AST resolves them down to ``Constant`` nodes with a complex value.
|
||
|
||
For the parser to resolve complex literals directly, the compiler would need to
|
||
be able to tell the tokenizer to generate a distinct token type for
|
||
imaginary numbers (e.g. ``INUMBER``), which would then allow the parser to
|
||
handle ``NUMBER + INUMBER`` and ``NUMBER - INUMBER`` separately from other
|
||
binary operations.
|
||
|
||
Alternatively, a new ``ComplexNumber`` AST node type could be defined, which
|
||
would allow the parser to notify the subsequent compiler stages that a
|
||
particular node should specifically be a complex literal, rather than an
|
||
arbitrary binary operation. Then the parser could accept ``NUMBER + NUMBER``
|
||
and ``NUMBER - NUMBER`` for that node, while letting the AST validation for
|
||
``ComplexNumber`` take care of ensuring that the real and imaginary parts of
|
||
the literal were real and imaginary numbers as expected.
|
||
|
||
For now, this PEP has postponed dealing with this question, and instead just
|
||
requires that complex literals be parenthesised in order to be used in value
|
||
constraints and as mapping pattern keys.
|
||
|
||
|
||
Allowing negated constraints in match patterns
|
||
----------------------------------------------
|
||
|
||
With the syntax proposed in this PEP, it isn't permitted to write ``!= expr``
|
||
or ``is not expr`` as a match pattern.
|
||
|
||
Both of these forms have clear potential interpretations as a negated equality
|
||
constraint (i.e. ``x != expr``) and a negated identity constraint
|
||
(i.e. ``x is not expr``).
|
||
|
||
However, it's far from clear either form would come up often enough to justify
|
||
the dedicated syntax, so the possible extension has been deferred pending further
|
||
community experience with match statements.
|
||
|
||
|
||
.. _membership constraints:
|
||
|
||
Allowing membership checks in match patterns
|
||
---------------------------------------------
|
||
|
||
The syntax used for equality and identity constraints would be straightforward
|
||
to extend to membership checks: ``in container``.
|
||
|
||
One downside of the proposals in both this PEP and PEP 634 is that checking
|
||
for multiple values in the same case doesn't look like any existing container
|
||
membership check in Python::
|
||
|
||
# PEP 634's literal patterns
|
||
match value:
|
||
case 0 | 1 | 2 | 3:
|
||
...
|
||
|
||
# This PEP's equality constraints
|
||
match value:
|
||
case == 0 | == 1 | == 2 | == 3:
|
||
...
|
||
|
||
Allowing inferred equality contraints under this PEP would only make it look
|
||
like the PEP 634 example, it still wouldn't look like the equivalent ``if``
|
||
statement header (``if value in {0, 1, 2, 3}:``).
|
||
|
||
Membership constraints would provide a more explicit, but still concise, way
|
||
to check if the match subject was present in a container, and it would look
|
||
the same as an ordinary containment check::
|
||
|
||
match value:
|
||
case in {0, 1, 2, 3}:
|
||
...
|
||
case in {one, two, three, four}:
|
||
...
|
||
case in range(4): # It would accept any container, not just literal sets
|
||
...
|
||
|
||
Such a feature would also be readily extensible to allow all kinds of case
|
||
clauses without any further syntax updates, simply by defining ``__contains__``
|
||
appropriately on a custom class definition.
|
||
|
||
However, while this does seem like a useful extension, and a good way to resolve
|
||
this PEP's verbosity problem when combining multiple equality checks in an
|
||
OR pattern, it isn't essential to making match statements a valuable addition
|
||
to the language, so it seems more appropriate to defer it to a separate proposal,
|
||
rather than including it here.
|
||
|
||
|
||
Inferring a default type for instance attribute constraints
|
||
-----------------------------------------------------------
|
||
|
||
The dedicated syntax for instance attribute constraints means that ``object``
|
||
could be omitted from ``object{.ATTR}`` to give ``{.ATTR}`` without introducing
|
||
any syntactic ambiguity (if no class was given, ``object`` would be implied,
|
||
just as it is for the base class list in class definitions).
|
||
|
||
However, it's far from clear saving six characters is worth making it harder to
|
||
visually distinguish mapping patterns from instance attribute patterns, so
|
||
allowing this has been deferred as a topic for possible future consideration.
|
||
|
||
|
||
Avoiding special cases in sequence patterns
|
||
-------------------------------------------
|
||
|
||
Sequence patterns in both this PEP and PEP 634 currently special case ``str``,
|
||
``bytes``, and ``bytearray`` as specifically *never* matching a sequence
|
||
pattern.
|
||
|
||
This special casing could potentially be removed if we were to define a new
|
||
``collections.abc.AtomicSequence`` abstract base class for types like these,
|
||
where they're conceptually a single item, but still implement the sequence
|
||
protocol to allow random access to their component parts.
|
||
|
||
|
||
Expression syntax to retrieve multiple attributes from an instance
|
||
------------------------------------------------------------------
|
||
|
||
The instance attribute pattern syntax has been designed such that it could
|
||
be used as the basis for a general purpose syntax for retrieving multiple
|
||
attributes from an object in a single expression::
|
||
|
||
host, port = obj{.host, .port}
|
||
|
||
Similar to slice syntax only being allowed inside bracket subscrpts, the
|
||
``.attr`` syntax for naming attributes would only be allowed inside brace
|
||
subscripts.
|
||
|
||
This idea isn't required for pattern matching to be useful, so it isn't part of
|
||
this PEP. However, it's mentioned as a possible path towards making pattern
|
||
matching feel more integrated into the rest of the language, rather than
|
||
existing forever in its own completely separated world.
|
||
|
||
|
||
Expression syntax to retrieve multiple attributes from an instance
|
||
------------------------------------------------------------------
|
||
|
||
If the brace subscript syntax were to be accepted for instance attribute
|
||
pattern matching, and then subsequently extended to offer general purpose
|
||
extraction of multiple attributes, then it could be extended even further to
|
||
allow for retrieval of multiple items from containers based on the syntax
|
||
used for mapping pattern matching::
|
||
|
||
host, port = obj{"host", "port"}
|
||
first, last = obj{0, -1}
|
||
|
||
Again, this idea isn't required for pattern matching to be useful, so it isn't
|
||
part of this PEP. As with retrieving multiple attributes, however, it is
|
||
included as an example of the proposed pattern matching syntax inspiring ideas
|
||
for making object deconstruction easier in general.
|
||
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
Restricting permitted expressions in value constraints and mapping pattern keys
|
||
-------------------------------------------------------------------------------
|
||
|
||
While it's entirely technically possible to restrict the kinds of expressions
|
||
permitted in value constraints and mapping pattern keys to just attribute
|
||
lookups and constant literals (as PEP 634 does), there isn't any clear runtime
|
||
value in doing so, so this PEP proposes allowing any kind of primary expression
|
||
(primary expressions are an existing node type in the grammar that includes
|
||
things like literals, names, attribute lookups, function calls, container
|
||
subscripts, parenthesised groups, etc), as well as high precedence unary
|
||
operations (``+``, ``-``, ``~``) on primary expressions.
|
||
|
||
While PEP 635 does emphasise several times that literal patterns and value
|
||
patterns are not full expressions, it doesn't ever articulate a concrete benefit
|
||
that is obtained from that restriction (just a theoretical appeal to it being
|
||
useful to separate static checks from dynamic checks, which a code style
|
||
tool could still enforce, even if the compiler itself is more permissive).
|
||
|
||
The last time we imposed such a restriction was for decorator expressions and
|
||
the primary outcome of that was that users had to put up with years of awkward
|
||
syntactic workarounds (like nesting arbitrary expressions inside function calls
|
||
that just returned their argument) to express the behaviour they wanted before
|
||
the language definition was finally updated to allow arbitrary expressions and
|
||
let users make their own decisions about readability.
|
||
|
||
The situation in PEP 634 that bears a resemblance to the situation with decorator
|
||
expressions is that arbitrary expressions are technically supported in value
|
||
patterns, they just require awkward workarounds where either all the values to
|
||
match need to be specified in a helper class that is placed before the match
|
||
statement::
|
||
|
||
# Allowing arbitrary match targets with PEP 634's value pattern syntax
|
||
class mt:
|
||
value = func()
|
||
match expr:
|
||
case (_, mt.value):
|
||
... # Handle the case where 'expr[1] == func()'
|
||
|
||
Or else they need to be written as a combination of a capture pattern and a
|
||
guard expression::
|
||
|
||
# Allowing arbitrary match targets with PEP 634's guard expressions
|
||
match expr:
|
||
case (_, _matched) if _matched == func():
|
||
... # Handle the case where 'expr[1] == func()'
|
||
|
||
This PEP proposes skipping requiring any such workarounds, and instead
|
||
supporting arbitrary value constraints from the start::
|
||
|
||
match expr:
|
||
case (__, == func()):
|
||
... # Handle the case where 'expr == func()'
|
||
|
||
Whether actually writing that kind of code is a good idea would be a topic for
|
||
style guides and code linters, not the language compiler.
|
||
|
||
In particular, if static analysers can't follow certain kinds of dynamic checks,
|
||
then they can limit the permitted expressions at analysis time, rather than the
|
||
compiler restricting them at compile time.
|
||
|
||
There are also some kinds of expressions that are almost certain to give
|
||
nonsensical results (e.g. ``yield``, ``yield from``, ``await``) due to the
|
||
pattern caching rule, where the number of times the constraint expression
|
||
actually gets evaluated will be implementation dependent. Even here, the PEP
|
||
takes the view of letting users write nonsense if they really want to.
|
||
|
||
Aside from the recenty updated decorator expressions, another situation where
|
||
Python's formal syntax offers full freedom of expression that is almost never
|
||
used in practice is in ``except`` clauses: the exceptions to match against
|
||
almost always take the form of a simple name, a dotted name, or a tuple of
|
||
those, but the language grammar permits arbitrary expressions at that point.
|
||
This is a good indication that Python's user base can be trusted to
|
||
take responsibility for finding readable ways to use permissive language
|
||
features, by avoiding writing hard to read constructs even when they're
|
||
permitted by the compiler.
|
||
|
||
This permissiveness comes with a real concrete benefit on the implementation
|
||
side: dozens of lines of match statement specific code in the compiler is
|
||
replaced by simple calls to the existing code for compiling expressions
|
||
(including in the AST validation pass, the AST optimization pass, the symbol
|
||
table analysis pass, and the code generation pass). This implementation
|
||
benefit would accrue not just to CPython, but to every other Python
|
||
implementation looking to add match statement support.
|
||
|
||
|
||
Requiring the use of constraint prefix markers for mapping pattern keys
|
||
-----------------------------------------------------------------------
|
||
|
||
The initial (unpublished) draft of this proposal suggested requiring mapping
|
||
pattern keys be value constraints, just as PEP 634 requires that they be valid
|
||
literal or value patterns::
|
||
|
||
import constants
|
||
|
||
match config:
|
||
case {== "route": route}:
|
||
process_route(route)
|
||
case {== constants.DEFAULT_PORT: sub_config, **rest}:
|
||
process_config(sub_config, rest)
|
||
|
||
However, the extra characters were syntactically noisy and unlike its use in
|
||
value constraints (where it distinguishes them from non-pattern expressions),
|
||
the prefix doesn't provide any additional information here that isn't already
|
||
conveyed by the expression's position as a key within a mapping pattern.
|
||
|
||
Accordingly, the proposal was simplified to omit the marker prefix from mapping
|
||
pattern keys.
|
||
|
||
This omission also aligns with the fact that containers may incorporate both
|
||
identity and equality checks into their lookup process - they don't purely
|
||
rely on equality checks, as would be incorrectly implied by the use of the
|
||
equality constraint prefix.
|
||
|
||
|
||
Allowing the key/value separator to be omitted for mapping value constraints
|
||
----------------------------------------------------------------------------
|
||
|
||
Instance attribute patterns allow the ``:`` separator to be omitted when
|
||
writing attribute value constraints like ``case object{.attr == expr}``.
|
||
|
||
Offering a similar shorthand for mapping value constraints was considered, but
|
||
permitting it allows thoroughly baffling constructs like ``case {0 == 0}:``
|
||
where the compiler knows this is the key ``0`` with the value constraint
|
||
``== 0``, but a human reader sees the tautological comparison operation
|
||
``0 == 0``. With the key/value separator included, the intent is more obvious to
|
||
a human reader as well: ``case {0: == 0}:``
|
||
|
||
|
||
Reference Implementation
|
||
========================
|
||
|
||
A draft reference implementation for this PEP [3_] has been derived from Brandt
|
||
Bucher's reference implementation for PEP 634 [4_].
|
||
|
||
Relative to the text of this PEP, the draft reference implementation has not
|
||
yet complemented the special casing of several builtin and standard library
|
||
types in ``MATCH_CLASS`` with the more general check for ``__match_args__``
|
||
being set to ``None``. Class defined patterns also currenty still accept
|
||
classes that don't define ``__match_args__``.
|
||
|
||
All other modified patterns have been updated to follow this PEP rather than
|
||
PEP 634.
|
||
|
||
Unparsing for match patterns has not yet been migrated to the updated v3 AST.
|
||
|
||
The AST validator for match patterns has not yet been implemented.
|
||
|
||
The AST validator in general has not yet been reviewed to ensure that it is
|
||
checking that only expression nodes are being passed in where expression nodes
|
||
are expected.
|
||
|
||
The examples in this PEP have not yet been converted to test cases, so could
|
||
plausibly contain typos and other errors.
|
||
|
||
Several of the old PEP 634 tests are still to be converted to new SyntaxError
|
||
tests.
|
||
|
||
The documentation has not yet been updated.
|
||
|
||
|
||
Acknowledgments
|
||
===============
|
||
|
||
The PEP 622 and PEP 634/635/636 authors, as the proposal in this PEP is merely
|
||
an attempt to improve the readability of an already well-constructed idea by
|
||
proposing that starting with a more explicit syntax and potentially introducing
|
||
syntactic shortcuts for particularly common operations later is a better option
|
||
than attempting to *only* define the shortcut version. For areas of the
|
||
specification where the two PEPs are the same (or at least very similar), the
|
||
text describing the intended behaviour in this PEP is often derived directly
|
||
from the PEP 634 text.
|
||
|
||
Steven D'Aprano, who made a compelling case that the key goals of this PEP could
|
||
be achieved by using existing comparison tokens to tell the ability to override
|
||
the compiler when our guesses as to "what most users will want most of the time"
|
||
are inevitably incorrect for at least some users some of the time, and retaining
|
||
some of PEP 634's syntactic sugar (with a slightly different semantic definition)
|
||
to obtain the same level of brevity as PEP 634 in most situations. (Paul
|
||
Sokolosvsky also independently suggested using ``==`` instead of ``?`` as a
|
||
more easily understood prefix for equality constraints).
|
||
|
||
Thomas Wouters, whose publication of PEP 640 and public review of the structured
|
||
pattern matching proposals persuaded the author of this PEP to continue
|
||
advocating for a wildcard pattern syntax that a future PEP could plausibly turn
|
||
into a hard keyword that always skips binding a reference in any location a
|
||
simple name is expected, rather than continuing indefinitely as the match
|
||
pattern specific soft keyword that is proposed here.
|
||
|
||
Joao Bueno and Jim Jewett for nudging the PEP author to take a closer look at
|
||
the proposed syntax for subelement capturing within class patterns and mapping
|
||
patterns (particularly the problems with "capturing to the right"). This
|
||
review is what prompted the significant changes between v2 and v3 of the
|
||
proposal.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] Post explaining the syntactic novelties in PEP 622
|
||
https://mail.python.org/archives/list/python-dev@python.org/message/2VRPDW4EE243QT3QNNCO7XFZYZGIY6N3/>
|
||
|
||
.. [2] Declined pull request proposing to list this as a Rejected Idea in PEP 622
|
||
https://github.com/python/peps/pull/1564
|
||
|
||
.. [3] In-progress reference implementation for this PEP
|
||
https://github.com/ncoghlan/cpython/tree/pep-642-constraint-patterns
|
||
|
||
.. [4] PEP 634 reference implementation
|
||
https://github.com/python/cpython/pull/22917
|
||
|
||
.. [5] Steven D'Aprano's cogent criticism of the first published iteration of this PEP
|
||
https://mail.python.org/archives/list/python-dev@python.org/message/BTHFWG6MWLHALOD6CHTUFPHAR65YN6BP/
|
||
|
||
.. [6] Thomas Wouter's initial review of the structured pattern matching proposals
|
||
https://mail.python.org/archives/list/python-dev@python.org/thread/4SBR3J5IQUYE752KR7C6432HNBSYKC5X/
|
||
|
||
.. [7] Stack Overflow answer regarding the use cases for ``_`` as an identifier
|
||
https://stackoverflow.com/questions/5893163/what-is-the-purpose-of-the-single-underscore-variable-in-python/5893946#5893946
|
||
|
||
.. [8] Pre-publication draft of "Precise Semantics for Pattern Matching"
|
||
https://github.com/markshannon/pattern-matching/blob/master/precise_semantics.rst
|
||
|
||
.. [9] Kohn et al., Dynamic Pattern Matching with Python
|
||
https://gvanrossum.github.io/docs/PyPatternMatching.pdf
|
||
|
||
|
||
.. _Appendix A:
|
||
|
||
Appendix A -- Full Grammar
|
||
==========================
|
||
|
||
Here is the full modified grammar for ``match_stmt``, replacing Appendix A
|
||
in PEP 634.
|
||
|
||
Notation used beyond standard EBNF is as per PEP 534:
|
||
|
||
- ``'KWD'`` denotes a hard keyword
|
||
- ``"KWD"`` denotes a soft keyword
|
||
- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*``
|
||
- ``!RULE`` is a negative lookahead assertion
|
||
|
||
::
|
||
|
||
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
|
||
subject_expr:
|
||
| star_named_expression ',' [star_named_expressions]
|
||
| named_expression
|
||
case_block: "case" (guarded_pattern | open_pattern) ':' block
|
||
|
||
guarded_pattern: closed_pattern 'if' named_expression
|
||
open_pattern: # Pattern may use multiple tokens with no closing delimiter
|
||
| as_pattern
|
||
| or_pattern
|
||
|
||
as_pattern: [closed_pattern] pattern_as_clause
|
||
as_pattern_with_inferred_wildcard: pattern_as_clause
|
||
pattern_as_clause: 'as' pattern_capture_target
|
||
pattern_capture_target: !"__" NAME !('.' | '(' | '=')
|
||
|
||
or_pattern: '|'.simple_pattern+
|
||
|
||
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
|
||
| closed_pattern
|
||
| value_constraint
|
||
|
||
value_constraint:
|
||
| eq_constraint
|
||
| id_constraint
|
||
|
||
eq_constraint: '==' closed_expr
|
||
id_constraint: 'is' closed_expr
|
||
|
||
closed_expr: # Require a single token or a closing delimiter in expression
|
||
| primary
|
||
| closed_factor
|
||
|
||
closed_factor: # "factor" is the main grammar node for these unary ops
|
||
| '+' primary
|
||
| '-' primary
|
||
| '~' primary
|
||
|
||
closed_pattern: # Require a single token or a closing delimiter in pattern
|
||
| wildcard_pattern
|
||
| group_pattern
|
||
| structural_constraint
|
||
|
||
wildcard_pattern: "__"
|
||
|
||
group_pattern: '(' open_pattern ')'
|
||
|
||
structural_constraint:
|
||
| sequence_constraint
|
||
| mapping_constraint
|
||
| attrs_constraint
|
||
| class_constraint
|
||
|
||
sequence_constraint: '[' [sequence_constraint_elements] ']'
|
||
sequence_constraint_elements: ','.sequence_constraint_element+ ','?
|
||
sequence_constraint_element:
|
||
| star_pattern
|
||
| simple_pattern
|
||
| as_pattern_with_inferred_wildcard
|
||
star_pattern: '*' (pattern_as_clause | wildcard_pattern)
|
||
|
||
mapping_constraint: '{' [mapping_constraint_elements] '}'
|
||
mapping_constraint_elements: ','.key_value_constraint+ ','?
|
||
key_value_constraint:
|
||
| closed_expr pattern_as_clause
|
||
| closed_expr ':' simple_pattern
|
||
| double_star_capture
|
||
double_star_capture: '**' pattern_as_clause
|
||
|
||
attrs_constraint:
|
||
| name_or_attr '{' [attrs_constraint_elements] '}'
|
||
name_or_attr: attr | NAME
|
||
attr: name_or_attr '.' NAME
|
||
attrs_constraint_elements: ','.attr_value_constraint+ ','?
|
||
attr_value_constraint:
|
||
| '.' NAME pattern_as_clause
|
||
| '.' NAME value_constraint
|
||
| '.' NAME ':' simple_pattern
|
||
| '.' NAME
|
||
|
||
class_constraint:
|
||
| name_or_attr '(' ')'
|
||
| name_or_attr '(' positional_patterns ','? ')'
|
||
| name_or_attr '(' class_constraint_attrs ')'
|
||
| name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')'
|
||
positional_patterns: ','.positional_pattern+
|
||
positional_pattern:
|
||
| simple_pattern
|
||
| as_pattern_with_inferred_wildcard
|
||
class_constraint_attrs:
|
||
| '**' '{' [attrs_constraint_elements] '}'
|
||
|
||
|
||
.. _Appendix B:
|
||
|
||
Appendix B: Summary of Abstract Syntax Tree changes
|
||
===================================================
|
||
|
||
The following new nodes are added to the AST by this PEP::
|
||
|
||
stmt = ...
|
||
| ...
|
||
| Match(expr subject, match_case* cases)
|
||
| ...
|
||
...
|
||
|
||
match_case = (pattern pattern, expr? guard, stmt* body)
|
||
|
||
pattern = MatchAlways
|
||
| MatchValue(matchop op, expr value)
|
||
| MatchSequence(pattern* patterns)
|
||
| MatchMapping(expr* keys, pattern* patterns)
|
||
| MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
|
||
| MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
|
||
|
||
| MatchRestOfSequence(identifier? target)
|
||
-- A NULL entry in the MatchMapping key list handles capturing extra mapping keys
|
||
|
||
| MatchAs(pattern? pattern, identifier target)
|
||
| MatchOr(pattern* patterns)
|
||
|
||
attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)
|
||
|
||
matchop = EqCheck | IdCheck
|
||
|
||
|
||
.. _Appendix C:
|
||
|
||
Appendix C: Summary of changes relative to PEP 634
|
||
==================================================
|
||
|
||
The overall `match`/`case` statement syntax and the guard expression syntax
|
||
remain the same as they are in PEP 634.
|
||
|
||
Relative to PEP 634 this PEP makes the following key changes:
|
||
|
||
* a new ``pattern`` type is defined in the AST, rather then reusing the ``expr``
|
||
type for patterns
|
||
* the new ``MatchAs`` and ``MatchOr`` AST nodes are moved from the ``expr``
|
||
type to the ``pattern`` type
|
||
* the wildcard pattern changes from ``_`` (single underscore) to ``__`` (double
|
||
underscore), and gains a dedicated ``MatchAlways`` node in the AST
|
||
* due to ambiguity of intent, value patterns and literal patterns are removed
|
||
* a new expression category is introduced: "closed expressions"
|
||
* closed expressions are either primary expressions, or a closed expression
|
||
preceded by one of the high precedence unary operators (``+``, ``-``, ``~``)
|
||
* a new pattern type is introduced: "value constraint patterns"
|
||
* value constraints have a dedicated ``MatchValue`` AST node rather than
|
||
allowing a combination of ``Constant`` (literals), ``UnaryOp``
|
||
(negative numbers), ``BinOp`` (complex numbers), and ``Attribute`` (attribute
|
||
lookups)
|
||
* value constraint patterns are either equality constraints or identity constraints
|
||
* equality constraints use ``==`` as a prefix marker on an otherwise
|
||
arbitrary closed expression: ``== EXPR``
|
||
* identity constraints use ``is`` as a prefix marker on an otherwise
|
||
arbitrary closed expression: ``is EXPR``
|
||
* due to ambiguity of intent, capture patterns are removed. All capture operations
|
||
use the ``as`` keyword (even in sequence matching) and are represented in the
|
||
AST as either ``MatchAs`` or ``MatchRestOfSequence`` nodes.
|
||
* to reduce verbosity in AS patterns, ``as NAME`` is permitted, with the same
|
||
meaning as ``__ as NAME``
|
||
* sequence patterns change to *require* the use of square brackets, rather than
|
||
offering the same syntactic flexibility as assignment targets (assignment
|
||
statements allow iterable unpacking to be indicated by any use of a tuple
|
||
separated target, with or without surrounding parentheses or square brackets)
|
||
* sequence patterns gain a dedicated ``MatchSequence`` AST node rather than
|
||
reusing ``List``
|
||
* mapping patterns change to allow arbitrary closed expressions as keys
|
||
* mapping patterns gain a dedicated ``MatchMapping`` AST node rather than
|
||
reusing ``Dict``
|
||
* to reduce verbosity in mapping patterns, ``KEY : __ as NAME`` may be shortened
|
||
to ``KEY as NAME``
|
||
* class patterns no longer use individual keyword argument syntax for attribute
|
||
matching. Instead they use double-star syntax, along with a variant on mapping
|
||
pattern syntax with a dot prefix on the attribute names
|
||
* class patterns gain a dedicated ``MatchClass`` AST node rather than
|
||
reusing ``Call``
|
||
* to reduce verbosity, class attribute matching allows ``:`` to be omitted when
|
||
the pattern to be matched starts with ``==``, ``is``, or ``as``
|
||
* class patterns treat any class that sets ``__match_args__`` to ``None`` as
|
||
accepting a single positional pattern that is matched against the entire
|
||
object (avoiding the special casing required in PEP 634)
|
||
* class patterns raise ``TypeError` when used with an object that does not
|
||
define ``__match_args__``
|
||
* dedicated syntax for ducktyping is added, such that ``case cls{...}:`` is
|
||
roughly equivalent to ``case cls(**{...}):``, but skips the check for the
|
||
existence of ``__match_args__``. This pattern also has a dedicated AST node,
|
||
``MatchAttrs``
|
||
|
||
Note that postponing literal patterns also makes it possible to postpone the
|
||
question of whether we need an "INUMBER" token in the tokeniser for imaginary
|
||
literals. Without it, the parser can't distinguish complex literals from other
|
||
binary addition and subtraction operations on constants, so proposals like
|
||
PEP 634 have to do work in later compilation steps to check for correct usage.
|
||
|
||
|
||
.. _Appendix D:
|
||
|
||
Appendix D: History of changes to this proposal
|
||
===============================================
|
||
|
||
The first published iteration of this proposal mostly followed PEP 634, but
|
||
suggested using ``?EXPR`` for equality constraints and ``?is EXPR`` for
|
||
identity constraints rather than PEP 634's value patterns and literal patterns.
|
||
|
||
The second published iteration mostly adopted a counter-proposal from Steven
|
||
D'Aprano that kept the PEP 634 style inferred constraints in many situations,
|
||
but also allowed the use of ``== EXPR`` for explicit equality constraints, and
|
||
``is EXPR`` for explicit identity constraints.
|
||
|
||
The third published (and current) iteration dropped inferred patterns entirely,
|
||
in an attempt to resolve the concerns with the fact that the patterns
|
||
``case {key: NAME}:`` and ``case cls(attr=NAME):`` would both bind ``NAME``
|
||
despite it appearing to the right of another subexpression without using the
|
||
``as`` keyword. The revised proposal also eliminates the possibility of writing
|
||
``case TARGET1 as TARGET2:``, which would bind to both of the given names. Of
|
||
those changes, the most concerning was ``case cls(attr=TARGET_NAME):``, since it
|
||
involved the use of ``=`` with the binding target on the right, the exact
|
||
opposite of what happens in assignment statements, function calls, and
|
||
function signature declarations.
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document is placed in the public domain or under the
|
||
CC0-1.0-Universal license, whichever is more permissive.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|