PEP: 642 Title: Explicit Pattern Syntax for Structural Pattern Matching Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan BDFL-Delegate: Discussions-To: Python-Dev Status: Draft Type: Standards Track Content-Type: text/x-rst Requires: 634 Created: 26-Sep-2020 Python-Version: 3.10 Post-History: 31-Oct-2020, 8-Nov-2020, 3-Jan-2021 Resolution: Abstract ======== This PEP covers an alternative syntax proposal for PEP 634's structural pattern matching that requires explicit prefixes on all capture patterns and value constraints. It also proposes a new dedicated syntax for instance attribute patterns that aligns more closely with the proposed mapping pattern syntax. While the result is necessarily more verbose than the proposed syntax in PEP 634, it is still significantly less verbose than the status quo. As an example, the following match statement would extract "host" and "port" details from a 2 item sequence, a mapping with "host" and "port" keys, any object with "host" and "port" attributes, or a "host:port" string, treating the "port" as optional in the latter three cases:: port = DEFAULT_PORT match expr: case [as host, as port]: pass case {"host" as host, "port" as port}: pass case {"host" as host}: pass case object{.host as host, .port as port}: pass case object{.host as host}: pass case str{} as addr: host, __, optional_port = addr.partition(":") if optional_port: port = optional_port case __ as m: raise TypeError(f"Unknown address format: {m!r:.200}") port = int(port) At a high level, this PEP proposes to categorise the different available pattern types as follows: * wildcard pattern: ``__`` * group patterns: ``(PTRN)`` * value constraint patterns: * equality constraints: ``== EXPR`` * identity contraints: ``is EXPR`` * structural constraint patterns: * sequence constraint patterns: ``[PTRN, as NAME, PTRN as NAME]`` * mapping constraint patterns: ``{EXPR: PTRN, EXPR as NAME}`` * instance attribute constraint patterns: ``CLS{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME}`` * class defined constraint patterns: ``CLS(PTRN, PTRN, **{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME})`` * OR patterns: ``PTRN | PTRN | PTRN`` * AS patterns: ``PTRN as NAME`` (omitting the pattern implies ``__``) The intent of this approach is to: * allow an initial form of pattern matching to be developed and released without needing to decide up front on the best default options for handling bare names, attribute lookups, and literal values * ensure that pattern matching is defined explicitly at the Abstract Syntax Tree level, allowing the specifications of the semantics and the surface syntax for pattern matching to be clearly separated * define a clear and concise "ducktyping" syntax that could potentially be adopted in ordinary expressions as a way to more easily retrieve a tuple containing multiple attributes from the same object Relative to PEP 634, the proposal also deliberately eliminates any syntax that "binds to the right" without using the ``as`` keyword (using capture patterns in PEP 634's mapping patterns and class patterns) or binds to both the left and the right in the same pattern (using PEP 634's capture patterns with AS patterns) Relationship with other PEPs ============================ This PEP both depends on and competes with PEP 634 - the PEP author agrees that match statements would be a sufficiently valuable addition to the language to be worth the additional complexity that they add to the learning process, but disagrees with the idea that "simple name vs literal or attribute lookup" really offers an adequate syntactic distinction between name binding and value lookup operations in match patterns (at least for Python). This PEP agrees with the spirit of PEP 640 (that the chosen wildcard pattern to skip a name binding should be supported everywhere, not just in match patterns), but is now proposing a different spelling for the wildcard syntax (``__`` rather than ``?``). As such, it competes with PEP 640 as written, but would complement a proposal to deprecate the use of ``__`` as an ordinary identifier and instead turn it into a general purpose wildcard marker that always skips making a new local variable binding. While it has not yet been put forward as a PEP, Mark Shannon has a pre-PEP draft [8_] expressing several concerns about the runtime semantics of the pattern matching proposal in PEP 634. This PEP is somewhat complementary to that one, as even though this PEP is mostly about surface syntax changes rather than major semantic changes, it does propose that the Abstract Syntax Tree definition be made more explicit to better separate the details of the surface syntax from the semantics of the code generation step. There is one specific idea in that pre-PEP draft that this PEP explicitly rejects: the idea that the different kinds of matching are mutually exclusive. It's entirely possible for the same value to match different kinds of structural pattern, and which one takes precedence will intentionally be governed by the order of the cases in the match statement. Motivation ========== The original PEP 622 (which was later split into PEP 634, PEP 635, and PEP 636) incorporated an unstated but essential assumption in its syntax design: that neither ordinary expressions *nor* the existing assignment target syntax provide an adequate foundation for the syntax used in match patterns. While the PEP didn't explicitly state this assumption, one of the PEP authors explained it clearly on python-dev [1_]: The actual problem that I see is that we have different cultures/intuitions fundamentally clashing here. In particular, so many programmers welcome pattern matching as an "extended switch statement" and find it therefore strange that names are binding and not expressions for comparison. Others argue that it is at odds with current assignment statements, say, and question why dotted names are _/not/_ binding. What all groups seem to have in common, though, is that they refer to _/their/_ understanding and interpretation of the new match statement as 'consistent' or 'intuitive' --- naturally pointing out where we as PEP authors went wrong with our design. But here is the catch: at least in the Python world, pattern matching as proposed by this PEP is an unprecedented and new way of approaching a common problem. It is not simply an extension of something already there. Even worse: while designing the PEP we found that no matter from which angle you approach it, you will run into issues of seeming 'inconsistencies' (which is to say that pattern matching cannot be reduced to a 'linear' extension of existing features in a meaningful way): there is always something that goes fundamentally beyond what is already there in Python. That's why I argue that arguments based on what is 'intuitive' or 'consistent' just do not make sense _/in this case/_. The first iteration of this PEP was then born out of an attempt to show that the second assertion was not accurate, and that match patterns could be treated as a variation on assignment targets without leading to inherent contradictions. (An earlier PR submitted to list this option in the "Rejected Ideas" section of the original PEP 622 had previously been declined [2_]). However, the review process for this PEP strongly suggested that not only did the contradictions that Tobias mentioned in his email exist, but they were also concerning enough to cast doubts on the syntax proposal presented in PEP 634. Accordingly, this PEP was changed to go even further than PEP 634, and largely abandon alignment between the sequence matching syntax and the existing iterable unpacking syntax (effectively answering "Not really, as least as far as the exact syntax is concerned" to the first question raised in the DLS'20 paper [9_]: "Can we extend a feature like iterable unpacking to work for more general object and data layouts?"). This resulted in a complete reversal of the goals of the PEP: rather than attempting to emphasise the similarities between assignment and pattern matching, the PEP now attempts to make sure that assignment target syntax isn't being reused *at all*, reducing the likelihood of incorrect inferences being drawn about the new construct based on experience with existing ones. Finally, before completing the 3rd iteration of the proposal (which dropped inferred patterns entirely), the PEP author spent quite a bit of time reflecting on the following entries in PEP 20: * Explicit is better than implicit. * Special cases aren't special enough to break the rules. * In the face of ambiguity, refuse the temptation to guess. If we start with an explicit syntax, we can always add syntactic shortcuts later (e.g. consider the recent proposals to add shortcuts for ``Union`` and ``Optional`` type hints only after years of experience with the original more verbose forms), while if we start out with only the abbreviated forms, then we don't have any real way to revisit those decisions in a future release. Specification ============= This PEP retains the overall ``match``/``case`` statement structure and semantics from PEP 634, but proposes multiple changes that mean that user intent is explicitly specified in the concrete syntax rather than needing to be inferred from the pattern matching context. In the proposed Abstract Syntax Tree, the semantics are also always explicit, with no inference required. The Match Statement ------------------- Surface syntax:: match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT subject_expr: | star_named_expression ',' star_named_expressions? | named_expression case_block: "case" (guarded_pattern | open_pattern) ':' block guarded_pattern: closed_pattern 'if' named_expression open_pattern: | as_pattern | or_pattern closed_pattern: | wildcard_pattern | group_pattern | structural_constraint Abstract syntax:: Match(expr subject, match_case* cases) match_case = (pattern pattern, expr? guard, stmt* body) The rules ``star_named_expression``, ``star_named_expressions``, ``named_expression`` and ``block`` are part of the `standard Python grammar `_. Open patterns are patterns which consist of multiple tokens, and aren't necessarily terminated by a closing delimiter (for example, ``__ as x``, ``int() | bool()``). To avoid ambiguity for human readers, their usage is restricted to top level patterns and to group patterns (which are patterns surrounded by parentheses). Closed patterns are patterns which either consist of a single token (i.e. ``__``), or else have a closing delimeter as a required part of their syntax (e.g. ``[as x, as y]``, ``object{.x as x, .y as y}``). As in PEP 634, the ``match`` and ``case`` keywords are soft keywords, i.e. they are not reserved words in other grammatical contexts (including at the start of a line if there is no colon where expected). This means that they are recognized as keywords when part of a match statement or case block only, and are allowed to be used in all other contexts as variable or argument names. Unlike PEP 634, patterns are explicitly defined as a new kind of node in the abstract syntax tree - even when surface syntax is shared with existing expression nodes, a distinct abstract node is emitted by the parser. For context, ``match_stmt`` is a new alternative for ``compound_statement`` in the surface syntax and ``Match`` is a new alternative for ``stmt`` in the abstract syntax. Match Semantics ^^^^^^^^^^^^^^^ This PEP largely retains the overall pattern matching semantics proposed in PEP 634. The proposed syntax for patterns changes significantly, and is discussed in detail below. There are also some proposed changes to the semantics of class defined constraints (class patterns in PEP 634) to eliminate the need to special case any builtin types (instead, the introduction of dedicated syntax for instance attribute constraints allows the behaviour needed by those builtin types to be specified as applying to any type that sets ``__match_args__`` to ``None``) .. _guards: Guards ^^^^^^ This PEP retains the guard clause semantics proposed in PEP 634. However, the syntax is changed slightly to require that when a guard clause is present, the case pattern must be a *closed* pattern. This makes it clearer to the reader where the pattern ends and the guard clause begins. (This is mainly a potential problem with OR patterns, where the guard clause looks kind of like the start of a conditional expression in the final pattern. Actually doing that isn't legal syntax, so there's no ambiguity as far as the compiler is concerned, but the distinction may not be as clear to a human reader) Irrefutable case blocks ^^^^^^^^^^^^^^^^^^^^^^^ The definition of irrefutable case blocks changes slightly in this PEP relative to PEP 634, as capture patterns no longer exist as a separate concept from AS patterns. Aside from that caveat, the handling of irrefutable cases is the same as in PEP 634: * wildcard patterns are irrefutable * AS patterns whose left-hand side is irrefutable * OR patterns containing at least one irrefutable pattern * parenthesized irrefutable patterns * a case block is considered irrefutable if it has no guard and its pattern is irrefutable. * a match statement may have at most one irrefutable case block, and it must be last. .. _patterns: Patterns -------- The top-level surface syntax for patterns is as follows:: open_pattern: # Pattern may use multiple tokens with no closing delimiter | as_pattern | or_pattern as_pattern: [closed_pattern] pattern_as_clause or_pattern: '|'.simple_pattern+ simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised | closed_pattern | value_constraint closed_pattern: # Require a single token or a closing delimiter in pattern | wildcard_pattern | group_pattern | structural_constraint As described above, the usage of open patterns is limited to top level case clauses and when parenthesised in a group pattern. The abstract syntax for patterns explicitly indicates which elements are subpatterns and which elements are subexpressions or identifiers:: pattern = MatchAlways | MatchValue(matchop op, expr value) | MatchSequence(pattern* patterns) | MatchMapping(expr* keys, pattern* patterns) | MatchAttrs(expr cls, identifier* attrs, pattern* patterns) | MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns) | MatchRestOfSequence(identifier? target) -- A NULL entry in the MatchMapping key list handles capturing extra mapping keys | MatchAs(pattern? pattern, identifier target) | MatchOr(pattern* patterns) AS Patterns ^^^^^^^^^^^ Surface syntax:: as_pattern: [closed_pattern] pattern_as_clause pattern_as_clause: 'as' pattern_capture_target pattern_capture_target: !"__" NAME !('.' | '(' | '=') (Note: the name on the right may not be ``__``.) Abstract syntax:: MatchAs(pattern? pattern, identifier target) An AS pattern matches the closed pattern on the left of the ``as`` keyword against the subject. If this fails, the AS pattern fails. Otherwise, the AS pattern binds the subject to the name on the right of the ``as`` keyword and succeeds. If no pattern to match is given, the wildcard pattern (``__``) is implied. To avoid confusion with the `wildcard pattern`_, the double underscore (``__``) is not permitted as a capture target (this is what ``!"__"`` expresses). A capture pattern always succeeds. It binds the subject value to the name using the scoping rules for name binding established for named expressions in PEP 572. (Summary: the name becomes a local variable in the closest containing function scope unless there's an applicable ``nonlocal`` or ``global`` statement.) In a given pattern, a given name may be bound only once. This disallows for example ``case [as x, as x]: ...`` but allows ``case [as x] | (as x)``: As an open pattern, the usage of AS patterns is limited to top level case clauses and when parenthesised in a group pattern. However, several of the structural constraints allow the use of ``pattern_as_clause`` in relevant locations to bind extracted elements of the matched subject to local variables. These are mostly represented in the abstract syntax tree as ``MatchAs`` nodes, aside from the dedicated ``MatchRestOfSequence`` node in sequence patterns. OR Patterns ^^^^^^^^^^^ Surface syntax:: or_pattern: '|'.simple_pattern+ simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised | closed_pattern | value_constraint Abstract syntax:: MatchOr(pattern* patterns) When two or more patterns are separated by vertical bars (``|``), this is called an OR pattern. (A single simple pattern is just that) Only the final subpattern may be irrefutable. Each subpattern must bind the same set of names. An OR pattern matches each of its subpatterns in turn to the subject, until one succeeds. The OR pattern is then deemed to succeed. If none of the subpatterns succeed the OR pattern fails. Subpatterns are mostly required to be closed patterns, but the parentheses may be omitted for value constraints. .. _value_constraints: Value constraints ^^^^^^^^^^^^^^^^^ Surface syntax:: value_constraint: | eq_constraint | id_constraint eq_constraint: '==' closed_expr id_constraint: 'is' closed_expr closed_expr: # Require a single token or a closing delimiter in expression | primary | closed_factor closed_factor: # "factor" is the main grammar node for these unary ops | '+' primary | '-' primary | '~' primary Abstract syntax:: MatchValue(matchop op, expr value) matchop = EqCheck | IdCheck The rule ``primary`` is defined in the standard Python grammar, and only allows expressions that either consist of a single token, or else are required to end with a closing delimiter. Value constraints replace PEP 634's literal patterns and value patterns. Equality constraints are written as ``== EXPR``, while identity constraints are written as ``is EXPR``. An equality constraint succeeds if the subject value compares equal to the value given on the right, while an identity constraint succeeds only if they are the exact same object. The expressions to be compared against are largely restricted to either single tokens (e.g. names, strings, numbers, builtin constants), or else to expressions that are required to end with a closing delimiter. The use of the high precedence unary operators is also permitted, as the risk of perceived ambiguity is low, and being able to specify negative numbers without parentheses is desirable. When the same constraint expression occurs multiple times in the same match statement, the interpreter may cache the first value calculated and reuse it, rather than repeat the expression evaluation. (As for PEP 634 value patterns, this cache is strictly tied to a given execution of a given match statement.) Unlike literal patterns in PEP 634, this PEP requires that complex literals be parenthesised to be accepted by the parser. See the Deferred Ideas section for discussion on that point. If this PEP were to be adopted in preference to PEP 634, then all literal and value patterns would instead be written more explicitly as value constraints:: # Literal patterns match number: case == 0: print("Nothing") case == 1: print("Just one") case == 2: print("A couple") case == -1: print("One less than nothing") case == (1-1j): print("Good luck with that...") # Additional literal patterns match value: case == True: print("True or 1") case == False: print("False or 0") case == None: print("None") case == "Hello": print("Text 'Hello'") case == b"World!": print("Binary 'World!'") # Matching by identity rather than equality SENTINEL = object() match value: case is True: print("True, not 1") case is False: print("False, not 0") case is None: print("None, following PEP 8 comparison guidelines") case is ...: print("May be useful when writing __getitem__ methods?") case is SENTINEL: print("Matches the sentinel by identity, not just value") # Matching against variables and attributes from enum import Enum class Sides(str, Enum): SPAM = "Spam" EGGS = "eggs" ... preferred_side = Sides.EGGS match entree[-1]: case == Sides.SPAM: # Compares entree[-1] == Sides.SPAM. response = "Have you got anything without Spam?" case == preferred_side: # Compares entree[-1] == preferred_side response = f"Oh, I love {preferred_side}!" case as side: # Assigns side = entree[-1]. response = f"Well, could I have their Spam instead of the {side} then?" Note the ``== preferred_side`` example: using an explicit prefix marker on constraint expressions removes the restriction to only working with attributes or literals for value lookups. The ``== (1-1j)`` example illustrates the use of parentheses to turn any subexpression into a closed one. .. _wildcard_pattern: Wildcard Pattern ^^^^^^^^^^^^^^^^ Surface syntax:: wildcard_pattern: "__" Abstract syntax:: MatchAlways A wildcard pattern always succeeds. As in PEP 634, it binds no name. Where PEP 634 chooses the single underscore as its wildcard pattern for consistency with other languages, this PEP chooses the double underscore as that has a clearer path towards potentially being made consistent across the entire language, whereas that path is blocked for ``"_"`` by i18n related use cases. Example usage:: match sequence: case [__]: # any sequence with a single element return True case [start, *__, end]: # a sequence with at least two elements return start == end case __: # anything return False Group Patterns ^^^^^^^^^^^^^^ Surface syntax:: group_pattern: '(' open_pattern ')' For the syntax of ``open_pattern``, see Patterns above. A parenthesized pattern has no additional syntax and is not represented in the abstract syntax tree. It allows users to add parentheses around patterns to emphasize the intended grouping, and to allow nesting of open patterns when the grammar requires a closed pattern. Unlike PEP 634, there is no potential ambiguity with sequence patterns, as this PEP requires that all sequence patterns be written with square brackets. Structural constraints ^^^^^^^^^^^^^^^^^^^^^^ Surface syntax:: structural_constraint: | sequence_constraint | mapping_constraint | attrs_constraint | class_constraint Note: the separate "structural constraint" subcategory isn't used in the abstract syntax tree, it's merely used as a convenient grouping node in the surface syntax definition. Structural constraints are patterns used to both make assertions about complex objects and to extract values from them. These patterns may all bind multiple values, either through the use of nested AS patterns, or else through the use of ``pattern_as_clause`` elements included in the definition of the pattern. Sequence constraints ^^^^^^^^^^^^^^^^^^^^ Surface syntax:: sequence_constraint: '[' [sequence_constraint_elements] ']' sequence_constraint_elements: ','.sequence_constraint_element+ ','? sequence_constraint_element: | star_pattern | simple_pattern | pattern_as_clause star_pattern: '*' (pattern_as_clause | wildcard_pattern) simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised | closed_pattern | value_constraint pattern_as_clause: 'as' pattern_capture_target Abstract syntax:: MatchSequence(pattern* patterns) MatchRestOfSequence(identifier? target) Sequence constraints allow items within a sequence to be checked and optionally extracted. A sequence pattern fails if the subject value is not an instance of ``collections.abc.Sequence``. It also fails if the subject value is an instance of ``str``, ``bytes`` or ``bytearray`` (see Deferred Ideas for a discussion on potentially removing the need for this special casing). A sequence pattern may contain at most one star subpattern. The star subpattern may occur in any position and is represented in the AST using the ``MatchRestOfSequence`` node. If no star subpattern is present, the sequence pattern is a fixed-length sequence pattern; otherwise it is a variable-length sequence pattern. A fixed-length sequence pattern fails if the length of the subject sequence is not equal to the number of subpatterns. A variable-length sequence pattern fails if the length of the subject sequence is less than the number of non-star subpatterns. The length of the subject sequence is obtained using the builtin ``len()`` function (i.e., via the ``__len__`` protocol). However, the interpreter may cache this value in a similar manner as described for value constraint expressions. A fixed-length sequence pattern matches the subpatterns to corresponding items of the subject sequence, from left to right. Matching stops (with a failure) as soon as a subpattern fails. If all subpatterns succeed in matching their corresponding item, the sequence pattern succeeds. A variable-length sequence pattern first matches the leading non-star subpatterns to the corresponding items of the subject sequence, as for a fixed-length sequence. If this succeeds, the star subpattern matches a list formed of the remaining subject items, with items removed from the end corresponding to the non-star subpatterns following the star subpattern. The remaining non-star subpatterns are then matched to the corresponding subject items, as for a fixed-length sequence. Subpatterns are mostly required to be closed patterns, but the parentheses may be omitted for value constraints. Sequence elements may also be captured unconditionally without parentheses. Note: where PEP 634 allows all the same syntactic flexibility as iterable unpacking in assignment statements, this PEP restricts sequence patterns specifically to the square bracket form. Given that the open and parenthesised forms are far more popular than square brackets for iterable unpacking, this helps emphasise that iterable unpacking and sequence matching are *not* the same operation. It also avoids the parenthesised form's ambiguity problem between single element sequence patterns and group patterns. Mapping constraints ^^^^^^^^^^^^^^^^^^^ Surface syntax:: mapping_constraint: '{' [mapping_constraint_elements] '}' mapping_constraint_elements: ','.key_value_constraint+ ','? key_value_constraint: | closed_expr pattern_as_clause | closed_expr ':' simple_pattern | double_star_capture double_star_capture: '**' pattern_as_clause (Note that ``**__`` is deliberately disallowed by this syntax, as additional mapping entries are ignored by default) closed_expr is defined above, under value constraints. Abstract syntax:: MatchMapping(expr* keys, pattern* patterns) Mapping constraints allow keys and values within a sequence to be checked and values to optionally be extracted. A mapping pattern fails if the subject value is not an instance of ``collections.abc.Mapping``. A mapping pattern succeeds if every key given in the mapping pattern is present in the subject mapping, and the pattern for each key matches the corresponding item of the subject mapping. The presence of keys is checked using the two argument form of the ``get`` method and a unique sentinel value, which offers the following benefits: * no exceptions need to be created in the lookup process * mappings that implement ``__missing__`` (such as ``collections.defaultdict``) only match on keys that they already contain, they don't implicitly add keys A mapping pattern may not contain duplicate key values. If duplicate keys are detected when checking the mapping pattern, the pattern is considered invalid, and a ``ValueError`` is raised. While it would theoretically be possible to checked for duplicated constant keys at compile time, no such check is currently defined or implemented. (Note: This semantic description is derived from the PEP 634 reference implementation, which differs from the PEP 634 specification text at time of writing. The implementation seems reasonable, so amending the PEP text seems like the best way to resolve the discrepancy) If a ``'**' as NAME`` double star pattern is present, that name is bound to a ``dict`` containing any remaining key-value pairs from the subject mapping (the dict will be empty if there are no additional key-value pairs). A mapping pattern may contain at most one double star pattern, and it must be last. Value subpatterns are mostly required to be closed patterns, but the parentheses may be omitted for value constraints (the ``:`` key/value separator is still required to ensure the entry doesn't look like an ordinary comparison operation). Mapping values may also be captured unconditionally using the ``KEY as NAME`` form, without either parentheses or the ``:`` key/value separator. Instance attribute constraints ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Surface syntax:: attrs_constraint: | name_or_attr '{' [attrs_constraint_elements] '}' attrs_constraint_elements: ','.attr_value_pattern+ ','? attr_value_pattern: | '.' NAME pattern_as_clause | '.' NAME value_constraint | '.' NAME ':' simple_pattern | '.' NAME Abstract syntax:: MatchAttrs(expr cls, identifier* attrs, pattern* patterns) Instance attribute constraints allow an instance's type to be checked and attributes to optionally be extracted. An instance attribute constraint may not repeat the same attribute name multiple times. Attempting to do so will result in a syntax error. An instance attribute pattern fails if the subject is not an instance of ``name_or_attr``. This is tested using ``isinstance()``. If ``name_or_attr`` is not an instance of the builtin ``type``, ``TypeError`` is raised. If no attribute subpatterns are present, the constraint succeeds if the ``isinstance()`` check succeeds. Otherwise: - Each given attribute name is looked up as an attribute on the subject. - If this raises an exception other than ``AttributeError``, the exception bubbles up. - If this raises ``AttributeError`` the constraint fails. - Otherwise, the subpattern associated with the keyword is matched against the attribute value. If no subpattern is specified, the wildcard pattern is assumed. If this fails, the constraint fails. If it succeeds, the match proceeds to the next attribute. - If all attribute subpatterns succeed, the constraint as a whole succeeds. Instance attribute constraints allow ducktyping checks to be implemented by using ``object`` as the required instance type (e.g. ``case object{.host as host, .port as port}:``). The syntax being proposed here could potentially also be used as the basis for a new syntax for retrieving multiple attributes from an object instance in one assignment statement (e.g. ``host, port = addr{.host, .port}``). See the Deferred Ideas section for further discussion of this point. Class defined constraints ^^^^^^^^^^^^^^^^^^^^^^^^^ Surface syntax:: class_constraint: | name_or_attr '(' ')' | name_or_attr '(' positional_patterns ','? ')' | name_or_attr '(' class_constraint_attrs ')' | name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')' positional_patterns: ','.positional_pattern+ positional_pattern: | simple_pattern | pattern_as_clause class_constraint_attrs: | '**' '{' [attrs_constraint_elements] '}' Abstract syntax:: MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns) Class defined constraints allow a sequence of common attributes to be specified on a class and checked positionally, rather than needing to specify the attribute names in every related match pattern. As for instance attribute patterns: - a class defined pattern fails if the subject is not an instance of ``name_or_attr``. This is tested using ``isinstance()``. - if ``name_or_attr`` is not an instance of the builtin ``type``, ``TypeError`` is raised. Regardless of whether or not any arguments are present, the subject is checked for a ``__match_args__`` attribute using the equivalent of ``getattr(cls, "__match_args__", _SENTINEL))``. If this raises an exception the exception bubbles up. If the returned value is not a list, tuple, or ``None``, the conversion fails and ``TypeError`` is raised at runtime. This means that only types that actually define ``__match_args__`` will be usable in class defined patterns. Types that don't define ``__match_args__`` will still be usable in instance attribute patterns. If ``__match_args__`` is ``None``, then only a single positional subpattern is permitted. Attempting to specify additional attribute patterns either positionally or using the double star syntax will cause ``TypeError`` to be raised at runtime. This positional subpattern is then matched against the entire subject, allowing a type check to be combined with another match pattern (e.g. checking both the type and contents of a container, or the type and value of a number). If ``__match_args__`` is a list or tuple, then the class defined constraint is converted to an instance attributes constraint as follows: - if only the double star attribute constraints subpattern is present, matching proceeds as if for the equivalent instance attributes constraint. - if there are more positional subpatterns than the length of ``__match_args__`` (as obtained using ``len()``), ``TypeError`` is raised. - Otherwise, positional pattern ``i`` is converted to an attribute pattern using ``__match_args__[i]`` as the attribute name. - if any element in ``__match_args__`` is not a string, ``TypeError`` is raised. - once the positional patterns have been converted to attribute patterns, then they are combined with any attribute constraints given in the double star attribute constraints subpattern, and matching proceeds as if for the equivalent instance attributes constraint. Note: the ``__match_args__ is None`` handling in this PEP replaces the special casing of ``bool``, ``bytearray``, ``bytes``, ``dict``, ``float``, ``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple`` in PEP 634. However, the optimised fast path for those types is retained in the implementation. Design Discussion ================= Requiring explicit qualification of simple names in match patterns ------------------------------------------------------------------ The first iteration of this PEP accepted the basic premise of PEP 634 that iterable unpacking syntax would provide a good foundation for defining a new syntax for pattern matching. During the review process, however, two major and one minor ambiguity problems were highlighted that arise directly from that core assumption: * most problematically, when binding simple names by default is extended to PEP 634's proposed class pattern syntax, the ``ATTR=TARGET_NAME`` construct binds to the right without using the ``as`` keyword, and uses the normal assignment-to-the-left sigil (``=``) to do it! * when binding simple names by default is extended to PEP 634's proposed mapping pattern syntax, the ``KEY: TARGET_NAME`` construct binds to the right without using the ``as`` keyword * using a PEP 634 capture pattern together with an AS pattern (``TARGET_NAME_1 as TARGET_NAME_2``) gives an odd "binds to both the left and right" behaviour The third revision of this PEP accounted for this problem by abandoning the alignment with iterable unpacking syntax, and instead requiring that all uses of bare simple names for anything other than a variable lookup be qualified by a preceding sigil or keyword: * ``as NAME``: local variable binding * ``.NAME``: attribute lookup * ``== NAME``: variable lookup * ``is NAME``: variable lookup * any other usage: variable lookup The key benefit of this approach is that it makes interpretation of simple names in patterns a local activity: a leading ``as`` indicates a name binding, a leading ``.`` indicates an attribute lookup, and anything else is a variable lookup (regardless of whether we're reading a subpattern or a subexpression). With the syntax now proposed in this PEP, the problematic cases identified above no longer read poorly: * ``.ATTR as TARGET_NAME`` is more obviously a binding than ``ATTR=TARGET_NAME`` * ``KEY as TARGET_NAME`` is more obviously a binding than ``KEY: TARGET_NAME`` * ``(as TARGET_NAME_1) as TARGET_NAME_2`` is more obviously two bindings than ``TARGET_NAME_1 as TARGET_NAME_2`` Resisting the temptation to guess --------------------------------- PEP 635 looks at the way pattern matching is used in other languages, and attempts to use that information to make plausible predictions about the way pattern matching will be used in Python: * wanting to extract values to local names will *probably* be more common than wanting to match against values stored in local names * wanting comparison by equality will *probably* be more common than wanting comparison by identity * users will *probably* be able to at least remember that bare names bind values and attribute references look up values, even if they can't figure that out for themselves without reading the documentation or having someone tell them To be clear, I think these predictions actually *are* plausible. However, I also don't think we need to guess about this up front: I think we can start out with a more explicit syntax that requires users to state their intent using a prefix marker (either ``as``, ``==``, or ``is``), and then reassess the situation in a few years based on how pattern matching is actually being used *in Python*. At that point, we'll be able to choose amongst at least the following options: * deciding the explicit syntax is concise enough, and not changing anything * adding inferred identity constraints for one or more of ``None``, ``...``, ``True`` and ``False`` * adding inferred equality constraints for other literals (potentially including complex literals) * adding inferred equality constraints for attribute lookups * adding either inferred equality constraints or inferred capture patterns for bare names All of those ideas could be considered independently on their own merits, rather than being a potential barrier to introducing pattern matching in the first place. If any of these syntactic shortcuts were to eventually be introduced, they'd also be straightforward to explain in terms of the underlying more explicit syntax (the leading ``as``, ``==``, or ``is`` would just be getting inferred by the parser, without the user needing to provide it explicitly). At the implementation level, only the parser should need to be change, as the existing AST nodes could be reused. Interaction with caching of attribute lookups in local variables ---------------------------------------------------------------- One of the major changes between this PEP and PEP 634 is to use ``== EXPR`` for equality constraint lookups, rather than only offering ``NAME.ATTR``. The original motivation for this was to avoid the semantic conflict with regular assignment targets, where ``NAME.ATTR`` is already used in assignment statements to set attributes, so if ``NAME.ATTR`` were the *only* syntax for symbolic value matching, then we're pre-emptively ruling out any future attempts to allow matching against single patterns using the existing assignment statement syntax. The current motivation is more about the general desire to avoid guessing about user's intent, and instead requiring them to state it explicitly in the syntax. However, even within match statements themselves, the ``name.attr`` syntax for value patterns has an undesirable interaction with local variable assignment, where routine refactorings that would be semantically neutral for any other Python statement introduce a major semantic change when applied to a PEP 634 style match statement. Consider the following code:: while value < self.limit: ... # Some code that adjusts "value" The attribute lookup can be safely lifted out of the loop and only performed once:: _limit = self.limit: while value < _limit: ... # Some code that adjusts "value" With the marker prefix based syntax proposal in this PEP, value constraints would be similarly tolerant of match patterns being refactored to use a local variable instead of an attribute lookup, with the following two statements being functionally equivalent:: match expr: case {"key": == self.target}: ... # Handle the case where 'expr["key"] == self.target' case __: ... # Handle the non-matching case _target = self.target match expr: case {"key": == _target}: ... # Handle the case where 'expr["key"] == self.target' case __: ... # Handle the non-matching case By contrast, when using PEP 634's value and capture pattern syntaxes that omit the marker prefix, the following two statements wouldn't be equivalent at all:: # PEP 634's value pattern syntax match expr: case {"key": self.target}: ... # Handle the case where 'expr["key"] == self.target' case _: ... # Handle the non-matching case # PEP 634's capture pattern syntax _target = self.target match expr: case {"key": _target}: ... # Matches any mapping with "key", binding its value to _target case _: ... # Handle the non-matching case This PEP ensures the original semantics are retained under this style of simplistic refactoring: use ``== name`` to force interpretation of the result as a value constraint, use ``as name`` for a name binding. PEP 634's proposal to offer only the shorthand syntax, with no explicitly prefixed form, means that the primary answer on offer is "Well, don't do that, then, only compare against attributes in namespaces, don't compare against simple names". PEP 622's walrus pattern syntax had another odd interaction where it might not bind the same object as the exact same walrus expression in the body of the case clause, but PEP 634 fixed that discrepancy by replacing walrus patterns with AS patterns (where the fact that the value bound to the name on the RHS might not be the same value as returned by the LHS is a standard feature common to all uses of the "as" keyword). Using existing comparison operators as the value constraint prefix -------------------------------------------------------------------- If the benefit of a dedicated value constraint prefix is accepted, then the next question is to ask exactly what that prefix should be. The initially published version of this PEP proposed using the previously unused ``?`` symbol as the prefix for equality constraints, and ``?is`` as the prefix for identity constraints. When reviewing the PEP, Steven D'Aprano presented a compelling counterproposal [5_] to use the existing comparison operators (``==`` and ``is``) instead. There were a few concerns with ``==`` as a prefix that kept it from being chosen as the prefix in the initial iteration of the PEP: * for common use cases, it's even more visually noisy than ``?``, as a lot of folks with PEP 8 trained aesthetic sensibilities are going to want to put a space between it and the following expression, effectively making it a 3 character prefix instead of 1 * when used in a mapping pattern, there needs to be a space between the ``:`` key/value separator and the ``==`` prefix, or the tokeniser will split them up incorrectly (getting ``:=`` and ``=`` instead of ``:`` and ``==``) * when used in an OR pattern, there needs to be a space between the ``|`` pattern separator and the ``==`` prefix, or the tokeniser will split them up incorrectly (getting ``|=`` and ``=`` instead of ``|`` and ``==``) * if used in a PEP 634 style class pattern, there needs to be a space between the ``=`` keyword separator and the ``==`` prefix, or the tokeniser will split them up incorrectly (getting ``==`` and ``=`` instead of ``=`` and ``==``) Rather than introducing a completely new symbol, Steven's proposed resolution to this verbosity problem was to retain the ability to omit the prefix marker in syntactically unambiguous cases. While the idea of omitting the prefix marker was accepted for the second revision of the proposal, it was dropped again in the third revision due to ambiguity concerns. Instead, the following points apply: * for class patterns, other syntax changes allow equality constraints to be written as ``.ATTR == EXPR``, and identity constraints to be written as ``.ATTR is EXPR``, both of which are quite easy to read * for mapping patterns, the extra syntactic noise is just tolerated (at least for now) * for OR patterns, the extra syntactic noise is just tolerated (at least for now). However, `membership constraints`_ may offer a future path to reducing the need to combine OR patterns with equality constraints (instead, the values to be checked against would be collected as a set, list, or tuple). Given that perspective, PEP 635's arguments against using ``?`` as part of the pattern matching syntax held for this proposal as well, and so the PEP was amended accordingly. Using ``__`` as the wildcard pattern marker ------------------------------------------- PEP 635 makes a solid case that introducing ``?`` *solely* as a wildcard pattern marker would be a bad idea. With the syntax for value constraints changed to use existing comparison operations rather than ``?`` and ``?is``, that argument holds for this PEP as well. However, as noted by Thomas Wouters in [6_], PEP 634's choice of ``_`` remains problematic as it would likely mean that match patterns would have a *permanent* difference from all other parts of Python - the use of ``_`` in software internationalisation and at the interactive prompt means that there isn't really a plausible path towards using it as a general purpose "skipped binding" marker. ``__`` is an alternative "this value is not needed" marker drawn from a Stack Overflow answer [7_] (originally posted by the author of this PEP) on the various meanings of ``_`` in existing Python code. This PEP also proposes adopting an implementation technique that limits the scope of the associated special casing of ``__`` to the parser: defining a new AST node type (``MatchAlways``) specifically for wildcard markers, rather than passing it through to the AST as a ``Name`` node. Within the parser, ``__`` still means either a regular name or a wildcard marker in a match pattern depending on where you were in the parse tree, but within the rest of the compiler, ``Name("__")`` is still a normal variable name, while ``MatchAlways()`` is always a wildcard marker in a match pattern. Unlike ``_``, the lack of other use cases for ``__`` means that there would be a plausible path towards restoring identifier handling consistency with the rest of the language by making ``__`` mean "skip this name binding" everywhere in Python: * in the interpreter itself, deprecate loading variables with the name ``__``. This would make reading from ``__`` emit a deprecation warning, while writing to it would initially be unchanged. To avoid slowing down all name loads, this could be handled by having the compiler emit additional code for the deprecated name, rather than using a runtime check in the standard name loading opcodes. * after a suitable number of releases, change the parser to emit a new ``SkippedBinding`` AST node for all uses of ``__`` as an assignment target, and update the rest of the compiler accordingly * consider making ``__`` a true hard keyword rather than a soft keyword This deprecation path couldn't be followed for ``_``, as there's no way for the interpreter to distinguish between attempts to read back ``_`` when nominally used as a "don't care" marker, and legitimate reads of ``_`` as either an i18n text translation function or as the last statement result at the interactive prompt. Names starting with double-underscores are also already reserved for use by the language, whether that is for compile time constants (i.e. ``__debug__``), special methods, or class attribute name mangling, so using ``__`` here would be consistent with that existing approach. Representing patterns explicitly in the Abstract Syntax Tree ------------------------------------------------------------ PEP 634 doesn't explicitly discuss how match statements should be represented in the Abstract Syntax Tree, instead leaving that detail to be defined as part of the implementation. As a result, while the reference implementation of PEP 634 definitely works (and formed the basis of the reference implementation of this PEP), it does contain a significant design flaw: despite the notes in PEP 635 that patterns should be considered as distinct from expressions, the reference implementation goes ahead and represents them in the AST as expression nodes. The result is an AST that isn't very abstract at all: nodes that should be compiled completely differently (because they're patterns rather than expressions) are represented the same way, and the type system of the implementation language (e.g. C for CPython) can't offer any assistance in keeping track of which subnodes should be ordinary expressions and which should be subpatterns. Rather than continuing with that approach, this PEP has instead defined a new explicit "pattern" node in the AST, which allows the patterns and their permitted subnodes to be defined explicitly in the AST itself, making the code implementing the new feature clearer, and allowing the C compiler to provide more assistance in keeping track of when the code generator is dealing with patterns or expressions. This change in implementation approach is actually orthogonal to the surface syntax changes proposed in this PEP, so it could still be adopted even if the rest of the PEP were to be rejected. Changes to sequence patterns ---------------------------- This PEP makes one notable change to sequence patterns relative to PEP 634: * only the square bracket form of sequence pattern is supported. Neither open (no delimeters) nor tuple style (parentheses as delimiters) sequence patterns are supported. Relative to PEP 634, sequence patterns are also significantly affected by the change to require explicit qualification of capture patterns and value constraints, as it means ``case [a, b, c]:`` must instead be written as ``case [as a, as b, as c]:`` and ``case [0, 1]:`` must instead be written as ``case [== 0, == 1]:``. With the syntax for sequence patterns no longer being derived directly from the syntax for iterable unpacking, it no longer made sense to keep the syntactic flexibility that had been included in the original syntax proposal purely for consistency with iterable unpacking. Allowing open and tuple style sequence patterns didn't increase expressivity, only ambiguity of intent (especially relative to group paterns), and encouraged readers down the path of viewing pattern matching syntax as intrinsically linked to assignment target syntax (which the PEP 634 authors have stated multiple times is not a desirable path to have readers take, and a view the author of this PEP now shares, despite disagreeing with it originally). Changes to mapping patterns --------------------------- This PEP makes two notable changes to mapping patterns relative to PEP 634: * value capturing is written as ``KEY as NAME`` rather than as ``KEY: NAME`` * a wider range of keys are permitted: any "closed expression", rather than only literals and attribute references As discussed above, the first change is part of ensuring that all binding operations with the target name to the right of a subexpression or pattern use the ``as`` keyword. The second change is mostly a matter of simplifying the parser and code generator code by reusing the existing expression handling machinery. The restriction to closed expressions is designed to help reduce ambiguity as to where the key expression ends and the match pattern begins. This mostly allows a superset of what PEP 634 allows, except that complex literals must be written in parentheses (at least for now). Adapting PEP 635's mapping pattern examples to the syntax proposed in this PEP:: match json_pet: case {"type": == "cat", "name" as name, "pattern" as pattern}: return Cat(name, pattern) case {"type": == "dog", "name" as name, "breed" as breed}: return Dog(name, breed) case __: raise ValueError("Not a suitable pet") def change_red_to_blue(json_obj): match json_obj: case { 'color': (== 'red' | == '#FF0000') }: json_obj['color'] = 'blue' case { 'children' as children }: for child in children: change_red_to_blue(child) For reference, the equivalent PEP 634 syntax:: match json_pet: case {"type": "cat", "name": name, "pattern": pattern}: return Cat(name, pattern) case {"type": "dog", "name": name, "breed": breed}: return Dog(name, breed) case _: raise ValueError("Not a suitable pet") def change_red_to_blue(json_obj): match json_obj: case { 'color': ('red' | '#FF0000') }: json_obj['color'] = 'blue' case { 'children': children }: for child in children: change_red_to_blue(child) Changes to class patterns ------------------------- This PEP makes several notable changes to class patterns relative to PEP 634: * the syntactic alignment with class instantiation is abandoned as being actively misleading and unhelpful. Instead, a new dedicated syntax for checking additional attributes is introduced that draws inspiration from mapping patterns rather than class instantiation * a new dedicated syntax for simple ducktyping that will work for any class is introduced * the special casing of various builtin and standard library types is supplemented by a general check for the existence of a ``__match_args__`` attribute with the value of ``None`` As discussed above, the first change has two purposes: * it's part of ensuring that all binding operations with the target name to the right of a subexpression or pattern use the ``as`` keyword. Using ``=`` to assign to the right is particularly problematic. * it's part of ensuring that all uses of simple names in patterns have a prefix that indicates their purpose (in this case, a leading ``.`` to indicate an attribute lookup) The syntactic alignment with class instantion was also judged to be unhelpful in general, as class patterns are about matching patterns against attributes, while class instantiation is about matching call arguments to parameters in class constructors, which may not bear much resemblance to the resulting instance attributes at all. The second change is intended to make it easier to use pattern matching for the "ducktyping" style checks that are already common in Python. The concrete syntax proposal for these patterns then arose from viewing instances as mappings of attribute names to values, and combining the attribute lookup syntax (``.ATTR``), with the mapping pattern syntax ``{KEY: PATTERN}`` to give ``cls{.ATTR: PATTERN}``. Allowing ``cls{.ATTR}`` to mean the same thing as ``cls{.ATTR: __}`` was a matter of considering the leading ``.`` sufficient to render the name usage unambiguous (it's clearly an attribute reference, whereas matching against a variable key in a mapping pattern would be arguably ambiguous) The final change just supplements a CPython-internal-only check in the PEP 634 reference implementation by making it the default behaviour that classes get if they don't define ``__match_args__`` (the optimised fast path for the builtin and standard library types named in PEP 634 is retained). Adapting the class matching example `linked from PEP 635 `_ shows that for purely positional class matching, the main impact comes from the changes to value constraints and name binding, not from the class matching changes:: match expr: case BinaryOp(== '+', as left, as right): return eval_expr(left) + eval_expr(right) case BinaryOp(== '-', as left, as right): return eval_expr(left) - eval_expr(right) case BinaryOp(== '*', as left, as right): return eval_expr(left) * eval_expr(right) case BinaryOp(== '/', as left, as right): return eval_expr(left) / eval_expr(right) case UnaryOp(== '+', as arg): return eval_expr(arg) case UnaryOp(== '-', as arg): return -eval_expr(arg) case VarExpr(as name): raise ValueError(f"Unknown value of: {name}") case float() | int(): return expr case __: raise ValueError(f"Invalid expression value: {repr(expr)}") For reference, the equivalent PEP 634 syntax:: match expr: case BinaryOp('+', left, right): return eval_expr(left) + eval_expr(right) case BinaryOp('-', left, right): return eval_expr(left) - eval_expr(right) case BinaryOp('*', left, right): return eval_expr(left) * eval_expr(right) case BinaryOp('/', left, right): return eval_expr(left) / eval_expr(right) case UnaryOp('+', arg): return eval_expr(arg) case UnaryOp('-', arg): return -eval_expr(arg) case VarExpr(name): raise ValueError(f"Unknown value of: {name}") case float() | int(): return expr case _: raise ValueError(f"Invalid expression value: {repr(expr)}") The changes to the class pattern syntax itself are more relevant when checking for named attributes and extracting their values without relying on ``__match_args__``:: match expr: case object{.host as host, .port as port}: pass case object{.host as host}: pass Compare this to the PEP 634 equivalent, where it really isn't clear which names are referring to attributes of the match subject and which names are referring to local variables:: match expr: case object(host=host, port=port): pass case object(host=host): pass In this specific case, that ambiguity doesn't matter (since the attribute and variable names are the same), but in the general case, knowing which is which will be critical to reasoning correctly about the code being read. Deferred Ideas ============== Inferred value constraints -------------------------- As discussed above, this PEP doesn't rule out the possibility of adding inferred equality and identity constraints in the future. These could be particularly valuable for literals, as it is quite likely that many "magic" strings and numbers with self-evident meanings will be written directly into match patterns, rather than being stored in named variables. (Think constants like ``None``, or obviously special numbers like ``0`` and ``1``, or strings where their contents are as descriptive as any variable name, rather than cryptic checks against opaque numbers like ``739452``) Making some required parentheses optional ----------------------------------------- The PEP currently errs heavily on the side of requiring parentheses in the face of potential ambiguity. However, there are a number of cases where it at least arguably goes too far, mostly involving AS patterns with an explicit pattern. In any position that requires a closed pattern, AS patterns may end up starting with doubled parentheses, as the nested pattern is also required to be a closed pattern: ``((OPEN PTRN) as NAME)`` Due to the requirement that the subpattern be closed, it should be reasonable in many of these cases (e.g. sequence pattern subpatterns) to accept ``CLOSED_PTRN as NAME`` directly. Further consideration of this point has been deferred, as making required parentheses optional is a backwards compatible change, and hence relaxing the restrictions later can be considered on a case-by-case basis. Accepting complex literals as closed expressions ------------------------------------------------ PEP 634's reference implementation includes a lot of special casing of binary operations in both the parser and the rest of the compiler in order to accept complex literals without accepting arbitrary binary numeric operations on literal values. Ideally, this problem would be dealt with at the parser layer, with the parser directly emitting a Constant AST node prepopulated with a complex number. If that was the way things worked, then complex literals could be accepted through a similar mechanism to any other literal. This isn't how complex literals are handled, however. Instead, they're passed through to the AST as regular ``BinOp`` nodes, and then the constant folding pass on the AST resolves them down to ``Constant`` nodes with a complex value. For the parser to resolve complex literals directly, the compiler would need to be able to tell the tokenizer to generate a distinct token type for imaginary numbers (e.g. ``INUMBER``), which would then allow the parser to handle ``NUMBER + INUMBER`` and ``NUMBER - INUMBER`` separately from other binary operations. Alternatively, a new ``ComplexNumber`` AST node type could be defined, which would allow the parser to notify the subsequent compiler stages that a particular node should specifically be a complex literal, rather than an arbitrary binary operation. Then the parser could accept ``NUMBER + NUMBER`` and ``NUMBER - NUMBER`` for that node, while letting the AST validation for ``ComplexNumber`` take care of ensuring that the real and imaginary parts of the literal were real and imaginary numbers as expected. For now, this PEP has postponed dealing with this question, and instead just requires that complex literals be parenthesised in order to be used in value constraints and as mapping pattern keys. Allowing negated constraints in match patterns ---------------------------------------------- With the syntax proposed in this PEP, it isn't permitted to write ``!= expr`` or ``is not expr`` as a match pattern. Both of these forms have clear potential interpretations as a negated equality constraint (i.e. ``x != expr``) and a negated identity constraint (i.e. ``x is not expr``). However, it's far from clear either form would come up often enough to justify the dedicated syntax, so the possible extension has been deferred pending further community experience with match statements. .. _membership constraints: Allowing membership checks in match patterns --------------------------------------------- The syntax used for equality and identity constraints would be straightforward to extend to membership checks: ``in container``. One downside of the proposals in both this PEP and PEP 634 is that checking for multiple values in the same case doesn't look like any existing container membership check in Python:: # PEP 634's literal patterns match value: case 0 | 1 | 2 | 3: ... # This PEP's equality constraints match value: case == 0 | == 1 | == 2 | == 3: ... Allowing inferred equality contraints under this PEP would only make it look like the PEP 634 example, it still wouldn't look like the equivalent ``if`` statement header (``if value in {0, 1, 2, 3}:``). Membership constraints would provide a more explicit, but still concise, way to check if the match subject was present in a container, and it would look the same as an ordinary containment check:: match value: case in {0, 1, 2, 3}: ... case in {one, two, three, four}: ... case in range(4): # It would accept any container, not just literal sets ... Such a feature would also be readily extensible to allow all kinds of case clauses without any further syntax updates, simply by defining ``__contains__`` appropriately on a custom class definition. However, while this does seem like a useful extension, and a good way to resolve this PEP's verbosity problem when combining multiple equality checks in an OR pattern, it isn't essential to making match statements a valuable addition to the language, so it seems more appropriate to defer it to a separate proposal, rather than including it here. Inferring a default type for instance attribute constraints ----------------------------------------------------------- The dedicated syntax for instance attribute constraints means that ``object`` could be omitted from ``object{.ATTR}`` to give ``{.ATTR}`` without introducing any syntactic ambiguity (if no class was given, ``object`` would be implied, just as it is for the base class list in class definitions). However, it's far from clear saving six characters is worth making it harder to visually distinguish mapping patterns from instance attribute patterns, so allowing this has been deferred as a topic for possible future consideration. Avoiding special cases in sequence patterns ------------------------------------------- Sequence patterns in both this PEP and PEP 634 currently special case ``str``, ``bytes``, and ``bytearray`` as specifically *never* matching a sequence pattern. This special casing could potentially be removed if we were to define a new ``collections.abc.AtomicSequence`` abstract base class for types like these, where they're conceptually a single item, but still implement the sequence protocol to allow random access to their component parts. Expression syntax to retrieve multiple attributes from an instance ------------------------------------------------------------------ The instance attribute pattern syntax has been designed such that it could be used as the basis for a general purpose syntax for retrieving multiple attributes from an object in a single expression:: host, port = obj{.host, .port} Similar to slice syntax only being allowed inside bracket subscrpts, the ``.attr`` syntax for naming attributes would only be allowed inside brace subscripts. This idea isn't required for pattern matching to be useful, so it isn't part of this PEP. However, it's mentioned as a possible path towards making pattern matching feel more integrated into the rest of the language, rather than existing forever in its own completely separated world. Expression syntax to retrieve multiple attributes from an instance ------------------------------------------------------------------ If the brace subscript syntax were to be accepted for instance attribute pattern matching, and then subsequently extended to offer general purpose extraction of multiple attributes, then it could be extended even further to allow for retrieval of multiple items from containers based on the syntax used for mapping pattern matching:: host, port = obj{"host", "port"} first, last = obj{0, -1} Again, this idea isn't required for pattern matching to be useful, so it isn't part of this PEP. As with retrieving multiple attributes, however, it is included as an example of the proposed pattern matching syntax inspiring ideas for making object deconstruction easier in general. Rejected Ideas ============== Restricting permitted expressions in value constraints and mapping pattern keys ------------------------------------------------------------------------------- While it's entirely technically possible to restrict the kinds of expressions permitted in value constraints and mapping pattern keys to just attribute lookups and constant literals (as PEP 634 does), there isn't any clear runtime value in doing so, so this PEP proposes allowing any kind of primary expression (primary expressions are an existing node type in the grammar that includes things like literals, names, attribute lookups, function calls, container subscripts, parenthesised groups, etc), as well as high precedence unary operations (``+``, ``-``, ``~``) on primary expressions. While PEP 635 does emphasise several times that literal patterns and value patterns are not full expressions, it doesn't ever articulate a concrete benefit that is obtained from that restriction (just a theoretical appeal to it being useful to separate static checks from dynamic checks, which a code style tool could still enforce, even if the compiler itself is more permissive). The last time we imposed such a restriction was for decorator expressions and the primary outcome of that was that users had to put up with years of awkward syntactic workarounds (like nesting arbitrary expressions inside function calls that just returned their argument) to express the behaviour they wanted before the language definition was finally updated to allow arbitrary expressions and let users make their own decisions about readability. The situation in PEP 634 that bears a resemblance to the situation with decorator expressions is that arbitrary expressions are technically supported in value patterns, they just require awkward workarounds where either all the values to match need to be specified in a helper class that is placed before the match statement:: # Allowing arbitrary match targets with PEP 634's value pattern syntax class mt: value = func() match expr: case (_, mt.value): ... # Handle the case where 'expr[1] == func()' Or else they need to be written as a combination of a capture pattern and a guard expression:: # Allowing arbitrary match targets with PEP 634's guard expressions match expr: case (_, _matched) if _matched == func(): ... # Handle the case where 'expr[1] == func()' This PEP proposes skipping requiring any such workarounds, and instead supporting arbitrary value constraints from the start:: match expr: case (__, == func()): ... # Handle the case where 'expr == func()' Whether actually writing that kind of code is a good idea would be a topic for style guides and code linters, not the language compiler. In particular, if static analysers can't follow certain kinds of dynamic checks, then they can limit the permitted expressions at analysis time, rather than the compiler restricting them at compile time. There are also some kinds of expressions that are almost certain to give nonsensical results (e.g. ``yield``, ``yield from``, ``await``) due to the pattern caching rule, where the number of times the constraint expression actually gets evaluated will be implementation dependent. Even here, the PEP takes the view of letting users write nonsense if they really want to. Aside from the recenty updated decorator expressions, another situation where Python's formal syntax offers full freedom of expression that is almost never used in practice is in ``except`` clauses: the exceptions to match against almost always take the form of a simple name, a dotted name, or a tuple of those, but the language grammar permits arbitrary expressions at that point. This is a good indication that Python's user base can be trusted to take responsibility for finding readable ways to use permissive language features, by avoiding writing hard to read constructs even when they're permitted by the compiler. This permissiveness comes with a real concrete benefit on the implementation side: dozens of lines of match statement specific code in the compiler is replaced by simple calls to the existing code for compiling expressions (including in the AST validation pass, the AST optimization pass, the symbol table analysis pass, and the code generation pass). This implementation benefit would accrue not just to CPython, but to every other Python implementation looking to add match statement support. Requiring the use of constraint prefix markers for mapping pattern keys ----------------------------------------------------------------------- The initial (unpublished) draft of this proposal suggested requiring mapping pattern keys be value constraints, just as PEP 634 requires that they be valid literal or value patterns:: import constants match config: case {== "route": route}: process_route(route) case {== constants.DEFAULT_PORT: sub_config, **rest}: process_config(sub_config, rest) However, the extra characters were syntactically noisy and unlike its use in value constraints (where it distinguishes them from non-pattern expressions), the prefix doesn't provide any additional information here that isn't already conveyed by the expression's position as a key within a mapping pattern. Accordingly, the proposal was simplified to omit the marker prefix from mapping pattern keys. This omission also aligns with the fact that containers may incorporate both identity and equality checks into their lookup process - they don't purely rely on equality checks, as would be incorrectly implied by the use of the equality constraint prefix. Allowing the key/value separator to be omitted for mapping value constraints ---------------------------------------------------------------------------- Instance attribute patterns allow the ``:`` separator to be omitted when writing attribute value constraints like ``case object{.attr == expr}``. Offering a similar shorthand for mapping value constraints was considered, but permitting it allows thoroughly baffling constructs like ``case {0 == 0}:`` where the compiler knows this is the key ``0`` with the value constraint ``== 0``, but a human reader sees the tautological comparison operation ``0 == 0``. With the key/value separator included, the intent is more obvious to a human reader as well: ``case {0: == 0}:`` Reference Implementation ======================== A draft reference implementation for this PEP [3_] has been derived from Brandt Bucher's reference implementation for PEP 634 [4_]. Relative to the text of this PEP, the draft reference implementation has not yet complemented the special casing of several builtin and standard library types in ``MATCH_CLASS`` with the more general check for ``__match_args__`` being set to ``None``. Class defined patterns also currently still accept classes that don't define ``__match_args__``. All other modified patterns have been updated to follow this PEP rather than PEP 634. Unparsing for match patterns has not yet been migrated to the updated v3 AST. The AST validator for match patterns has not yet been implemented. The AST validator in general has not yet been reviewed to ensure that it is checking that only expression nodes are being passed in where expression nodes are expected. The examples in this PEP have not yet been converted to test cases, so could plausibly contain typos and other errors. Several of the old PEP 634 tests are still to be converted to new SyntaxError tests. The documentation has not yet been updated. Acknowledgments =============== The PEP 622 and PEP 634/635/636 authors, as the proposal in this PEP is merely an attempt to improve the readability of an already well-constructed idea by proposing that starting with a more explicit syntax and potentially introducing syntactic shortcuts for particularly common operations later is a better option than attempting to *only* define the shortcut version. For areas of the specification where the two PEPs are the same (or at least very similar), the text describing the intended behaviour in this PEP is often derived directly from the PEP 634 text. Steven D'Aprano, who made a compelling case that the key goals of this PEP could be achieved by using existing comparison tokens to tell the ability to override the compiler when our guesses as to "what most users will want most of the time" are inevitably incorrect for at least some users some of the time, and retaining some of PEP 634's syntactic sugar (with a slightly different semantic definition) to obtain the same level of brevity as PEP 634 in most situations. (Paul Sokolosvsky also independently suggested using ``==`` instead of ``?`` as a more easily understood prefix for equality constraints). Thomas Wouters, whose publication of PEP 640 and public review of the structured pattern matching proposals persuaded the author of this PEP to continue advocating for a wildcard pattern syntax that a future PEP could plausibly turn into a hard keyword that always skips binding a reference in any location a simple name is expected, rather than continuing indefinitely as the match pattern specific soft keyword that is proposed here. Joao Bueno and Jim Jewett for nudging the PEP author to take a closer look at the proposed syntax for subelement capturing within class patterns and mapping patterns (particularly the problems with "capturing to the right"). This review is what prompted the significant changes between v2 and v3 of the proposal. References ========== .. [1] Post explaining the syntactic novelties in PEP 622 https://mail.python.org/archives/list/python-dev@python.org/message/2VRPDW4EE243QT3QNNCO7XFZYZGIY6N3/> .. [2] Declined pull request proposing to list this as a Rejected Idea in PEP 622 https://github.com/python/peps/pull/1564 .. [3] In-progress reference implementation for this PEP https://github.com/ncoghlan/cpython/tree/pep-642-constraint-patterns .. [4] PEP 634 reference implementation https://github.com/python/cpython/pull/22917 .. [5] Steven D'Aprano's cogent criticism of the first published iteration of this PEP https://mail.python.org/archives/list/python-dev@python.org/message/BTHFWG6MWLHALOD6CHTUFPHAR65YN6BP/ .. [6] Thomas Wouter's initial review of the structured pattern matching proposals https://mail.python.org/archives/list/python-dev@python.org/thread/4SBR3J5IQUYE752KR7C6432HNBSYKC5X/ .. [7] Stack Overflow answer regarding the use cases for ``_`` as an identifier https://stackoverflow.com/questions/5893163/what-is-the-purpose-of-the-single-underscore-variable-in-python/5893946#5893946 .. [8] Pre-publication draft of "Precise Semantics for Pattern Matching" https://github.com/markshannon/pattern-matching/blob/master/precise_semantics.rst .. [9] Kohn et al., Dynamic Pattern Matching with Python https://gvanrossum.github.io/docs/PyPatternMatching.pdf .. _Appendix A: Appendix A -- Full Grammar ========================== Here is the full modified grammar for ``match_stmt``, replacing Appendix A in PEP 634. Notation used beyond standard EBNF is as per PEP 534: - ``'KWD'`` denotes a hard keyword - ``"KWD"`` denotes a soft keyword - ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*`` - ``!RULE`` is a negative lookahead assertion :: match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT subject_expr: | star_named_expression ',' [star_named_expressions] | named_expression case_block: "case" (guarded_pattern | open_pattern) ':' block guarded_pattern: closed_pattern 'if' named_expression open_pattern: # Pattern may use multiple tokens with no closing delimiter | as_pattern | or_pattern as_pattern: [closed_pattern] pattern_as_clause as_pattern_with_inferred_wildcard: pattern_as_clause pattern_as_clause: 'as' pattern_capture_target pattern_capture_target: !"__" NAME !('.' | '(' | '=') or_pattern: '|'.simple_pattern+ simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised | closed_pattern | value_constraint value_constraint: | eq_constraint | id_constraint eq_constraint: '==' closed_expr id_constraint: 'is' closed_expr closed_expr: # Require a single token or a closing delimiter in expression | primary | closed_factor closed_factor: # "factor" is the main grammar node for these unary ops | '+' primary | '-' primary | '~' primary closed_pattern: # Require a single token or a closing delimiter in pattern | wildcard_pattern | group_pattern | structural_constraint wildcard_pattern: "__" group_pattern: '(' open_pattern ')' structural_constraint: | sequence_constraint | mapping_constraint | attrs_constraint | class_constraint sequence_constraint: '[' [sequence_constraint_elements] ']' sequence_constraint_elements: ','.sequence_constraint_element+ ','? sequence_constraint_element: | star_pattern | simple_pattern | as_pattern_with_inferred_wildcard star_pattern: '*' (pattern_as_clause | wildcard_pattern) mapping_constraint: '{' [mapping_constraint_elements] '}' mapping_constraint_elements: ','.key_value_constraint+ ','? key_value_constraint: | closed_expr pattern_as_clause | closed_expr ':' simple_pattern | double_star_capture double_star_capture: '**' pattern_as_clause attrs_constraint: | name_or_attr '{' [attrs_constraint_elements] '}' name_or_attr: attr | NAME attr: name_or_attr '.' NAME attrs_constraint_elements: ','.attr_value_constraint+ ','? attr_value_constraint: | '.' NAME pattern_as_clause | '.' NAME value_constraint | '.' NAME ':' simple_pattern | '.' NAME class_constraint: | name_or_attr '(' ')' | name_or_attr '(' positional_patterns ','? ')' | name_or_attr '(' class_constraint_attrs ')' | name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')' positional_patterns: ','.positional_pattern+ positional_pattern: | simple_pattern | as_pattern_with_inferred_wildcard class_constraint_attrs: | '**' '{' [attrs_constraint_elements] '}' .. _Appendix B: Appendix B: Summary of Abstract Syntax Tree changes =================================================== The following new nodes are added to the AST by this PEP:: stmt = ... | ... | Match(expr subject, match_case* cases) | ... ... match_case = (pattern pattern, expr? guard, stmt* body) pattern = MatchAlways | MatchValue(matchop op, expr value) | MatchSequence(pattern* patterns) | MatchMapping(expr* keys, pattern* patterns) | MatchAttrs(expr cls, identifier* attrs, pattern* patterns) | MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns) | MatchRestOfSequence(identifier? target) -- A NULL entry in the MatchMapping key list handles capturing extra mapping keys | MatchAs(pattern? pattern, identifier target) | MatchOr(pattern* patterns) attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) matchop = EqCheck | IdCheck .. _Appendix C: Appendix C: Summary of changes relative to PEP 634 ================================================== The overall ``match``/``case`` statement syntax and the guard expression syntax remain the same as they are in PEP 634. Relative to PEP 634 this PEP makes the following key changes: * a new ``pattern`` type is defined in the AST, rather than reusing the ``expr`` type for patterns * the new ``MatchAs`` and ``MatchOr`` AST nodes are moved from the ``expr`` type to the ``pattern`` type * the wildcard pattern changes from ``_`` (single underscore) to ``__`` (double underscore), and gains a dedicated ``MatchAlways`` node in the AST * due to ambiguity of intent, value patterns and literal patterns are removed * a new expression category is introduced: "closed expressions" * closed expressions are either primary expressions, or a closed expression preceded by one of the high precedence unary operators (``+``, ``-``, ``~``) * a new pattern type is introduced: "value constraint patterns" * value constraints have a dedicated ``MatchValue`` AST node rather than allowing a combination of ``Constant`` (literals), ``UnaryOp`` (negative numbers), ``BinOp`` (complex numbers), and ``Attribute`` (attribute lookups) * value constraint patterns are either equality constraints or identity constraints * equality constraints use ``==`` as a prefix marker on an otherwise arbitrary closed expression: ``== EXPR`` * identity constraints use ``is`` as a prefix marker on an otherwise arbitrary closed expression: ``is EXPR`` * due to ambiguity of intent, capture patterns are removed. All capture operations use the ``as`` keyword (even in sequence matching) and are represented in the AST as either ``MatchAs`` or ``MatchRestOfSequence`` nodes. * to reduce verbosity in AS patterns, ``as NAME`` is permitted, with the same meaning as ``__ as NAME`` * sequence patterns change to *require* the use of square brackets, rather than offering the same syntactic flexibility as assignment targets (assignment statements allow iterable unpacking to be indicated by any use of a tuple separated target, with or without surrounding parentheses or square brackets) * sequence patterns gain a dedicated ``MatchSequence`` AST node rather than reusing ``List`` * mapping patterns change to allow arbitrary closed expressions as keys * mapping patterns gain a dedicated ``MatchMapping`` AST node rather than reusing ``Dict`` * to reduce verbosity in mapping patterns, ``KEY : __ as NAME`` may be shortened to ``KEY as NAME`` * class patterns no longer use individual keyword argument syntax for attribute matching. Instead they use double-star syntax, along with a variant on mapping pattern syntax with a dot prefix on the attribute names * class patterns gain a dedicated ``MatchClass`` AST node rather than reusing ``Call`` * to reduce verbosity, class attribute matching allows ``:`` to be omitted when the pattern to be matched starts with ``==``, ``is``, or ``as`` * class patterns treat any class that sets ``__match_args__`` to ``None`` as accepting a single positional pattern that is matched against the entire object (avoiding the special casing required in PEP 634) * class patterns raise ``TypeError`` when used with an object that does not define ``__match_args__`` * dedicated syntax for ducktyping is added, such that ``case cls{...}:`` is roughly equivalent to ``case cls(**{...}):``, but skips the check for the existence of ``__match_args__``. This pattern also has a dedicated AST node, ``MatchAttrs`` Note that postponing literal patterns also makes it possible to postpone the question of whether we need an "INUMBER" token in the tokeniser for imaginary literals. Without it, the parser can't distinguish complex literals from other binary addition and subtraction operations on constants, so proposals like PEP 634 have to do work in later compilation steps to check for correct usage. .. _Appendix D: Appendix D: History of changes to this proposal =============================================== The first published iteration of this proposal mostly followed PEP 634, but suggested using ``?EXPR`` for equality constraints and ``?is EXPR`` for identity constraints rather than PEP 634's value patterns and literal patterns. The second published iteration mostly adopted a counter-proposal from Steven D'Aprano that kept the PEP 634 style inferred constraints in many situations, but also allowed the use of ``== EXPR`` for explicit equality constraints, and ``is EXPR`` for explicit identity constraints. The third published (and current) iteration dropped inferred patterns entirely, in an attempt to resolve the concerns with the fact that the patterns ``case {key: NAME}:`` and ``case cls(attr=NAME):`` would both bind ``NAME`` despite it appearing to the right of another subexpression without using the ``as`` keyword. The revised proposal also eliminates the possibility of writing ``case TARGET1 as TARGET2:``, which would bind to both of the given names. Of those changes, the most concerning was ``case cls(attr=TARGET_NAME):``, since it involved the use of ``=`` with the binding target on the right, the exact opposite of what happens in assignment statements, function calls, and function signature declarations. Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: