diff --git a/pep-0642.rst b/pep-0642.rst index fdcaaab08..28e88d9a5 100644 --- a/pep-0642.rst +++ b/pep-0642.rst @@ -11,43 +11,47 @@ Content-Type: text/x-rst Requires: 634 Created: 26-Sep-2020 Python-Version: 3.10 -Post-History: 31-Oct-2020 +Post-History: 31-Oct-2020, 8-Nov-2020 Resolution: Abstract ======== This PEP covers an alternative syntax proposal for PEP 634's structural pattern -matching that explicitly anchors match expressions in the existing syntax for +matching that explicitly anchors match patterns in the existing syntax for assignment targets, while retaining most semantic aspects of the existing proposal. Specifically, this PEP adopts an additional design restriction that PEP 634's authors considered unreasonable: that any syntax that is common to both assignment targets and match patterns must have a comparable semantic effect, -while any novel match pattern semantics must use syntax which emits a syntax -error when used in an assignment target. +while any novel match pattern semantics must offer syntax which emits a syntax +error when used in an assignment target. It is still considered acceptable to +offer syntactic sugar that is specific to match patterns, as long as there is +an underlying more explicit form that is compatible with assignment targets. As a consequence, this PEP proposes the following changes to the proposed match pattern syntax: -* Literal patterns and value patterns are combined into a single new - pattern type: "constraint patterns" -* Constraint patterns are either equality constraints or identity constraints -* Equality constraints use ``?`` as a prefix marker on an otherwise - arbitrary primary expression: ``?EXPR`` -* Identity constraints use `?is` as a prefix marker on an otherwise - arbitrary primary expression: ``?is EXPR`` -* There is no special casing of the ``None``, ``True``, or ``False`` literals -* The constraint expression in an equality constraint may be omitted to give a - non-binding wildcard pattern +* a new pattern type is introduced: "constraint patterns" +* constraint patterns are either equality constraints or identity constraints +* equality constraints use ``==`` as a prefix marker on an otherwise + arbitrary primary expression: ``== EXPR`` +* identity constraints use ``is`` as a prefix marker on an otherwise + arbitrary primary expression: ``is EXPR`` +* value patterns and literal patterns (with some exceptions) are redefined as + "inferred equality constraints", and become a syntactic shorthand for an + equality constraint +* ``None`` and ``...`` are defined as "inferred identity constraints" and become + a syntactic shorthand for an identity constraint +* due to ambiguity of intent, neither ``True`` nor ``False`` are accepted as + implying an inferred constraint (instead requiring the use of an explicit + constraint, a class pattern, or a capture pattern with a guard expression) +* inferred constraints are *not* defined in the Abstract Syntax Tree. Instead, + inferred constraints are converted to explicit constraints by the parser +* ``_`` remains the wildcard pattern, but gains a dedicated ``SkippedBinding`` + AST node to distinguish it from the use of ``_`` as an identifier * Mapping patterns change to allow arbitrary primary expressions as keys -* Attempting to use a dotted name as a match pattern is a syntax error rather - than implying an equality constraint -* Attempting to use a literal as a match pattern is a syntax error rather - than implying an equality or identity constraint -* The ``_`` identifier is no longer syntactically special (it is a normal - capture pattern, just as it is an ordinary assignment target) Relationship with other PEPs @@ -56,20 +60,22 @@ Relationship with other PEPs This PEP both depends on and competes with PEP 634 - the PEP author agrees that match statements would be a sufficiently valuable addition to the language to be worth the additional complexity that they add to the learning process, but -disagrees with the idea that "simple name vs literal or attribute lookup" offers -an adequate syntactic distinction between name binding and value lookup -operations in match patterns. +disagrees with the idea that "simple name vs literal or attribute lookup" +really offers an adequate syntactic distinction between name binding and value +lookup operations in match patterns. (Even though this PEP ultimately retained +that shorthand to reduce the verbosity of common use cases, it still redefines +it in terms of a more explicit underlying construct). -By switching the wildcard pattern to "?", this PEP complements the proposal in -PEP 640 to allow the use of wildcard patterns in other contexts where a name -binding is syntactically required, but the application doesn't actually need -the value. +By dropping its own proposal to switch the wildcard pattern to ``?`` (and +instead retaining PEP 634's ``_``), this PEP now effectively votes against +the proposal in PEP 640 to allow the use of ``?`` as a general purpose wildcard +marker in name binding operations. Motivation ========== -The original PEP 622 (which was later split into PEPs 634, 635, and 636) +The original PEP 622 (which was later split into PEP 634, PEP 635, and PEP 636) incorporated an unstated but essential assumption in its syntax design: that neither ordinary expressions *nor* the existing assignment target syntax provide an adequate foundation for the syntax used in match patterns. @@ -103,8 +109,8 @@ PEP 635 (and PEP 622 before it) makes a strong case that treating capture patterns as the default usage for simple names in match patterns is the right approach, and provides a number of examples where having names express value constraints by default would be confusing (this difference from C/C++ switch -statement semantics is also a key reason it makes sense to use `match` as the -introductory keyword for the new statement rather than `switch`). +statement semantics is also a key reason it makes sense to use ``match`` as the +introductory keyword for the new statement rather than ``switch``). However, PEP 635 doesn't even *try* to make the case for the second assertion, that treating match patterns as a variation on assignment targets also leads to @@ -115,15 +121,20 @@ This PEP instead starts from the assumption that it *is* possible to treat match patterns as a variation on assignment targets, and the only essential differences that emerge relative to the syntactic proposal in PEP 634 are: -* a requirement to use an explicit marker prefix on value lookups rather than - allowing them to be implied by the use of dotted names; and +* a requirement to offer an explicit marker prefix for value lookups rather than + only allowing them to be inferred from the use of dotted names or literals; and * a requirement to use a non-binding wildcard marker other than ``_``. +This PEP concedes the second point in the name of cross-language consistency +(and for lack of a compelling alternative wildcard marker), but proposes +constraint expressions as a way of addressing the first point. + PEP 634 also proposes special casing the literals ``None``, ``True``, and ``False`` so that they're compared by identity when written directly as a literal pattern, but by equality when referenced by a value pattern. This PEP -eliminates those special cases by proposing distinct syntax for matching by -identity and matching by equality. +eliminates the need for those special cases by proposing distinct syntax for +matching by identity and matching by equality (but does accept the convenience +and consistency argument in allowing ``None`` as a shorthand for ``is None``). Specification @@ -140,30 +151,19 @@ retains both the syntax and semantics for the following match pattern variants: Pattern combination (both OR and AS patterns) and guard expressions also remain the same as they are in PEP 634. -Wildcard patterns change their syntactic marker from `_` to `?`. +Constraint patterns are added, offering equality constraints and identity +constraints. -Literal patterns and value patterns are replaced by constraint -patterns. +Literal patterns and value patterns are replaced by inferred constraint +patterns, offering inferred equality constraints for strings, numbers and +attribute lookups, and inferred identity constraints for ``None`` and ``...``. Mapping patterns change to allow arbitrary primary expressions for keys, rather than being restricted to literal patterns or value patterns. - -Wildcard patterns ------------------ - -Wildcard patterns change their syntactic marker from `_` to `?`:: - - # Wildcard pattern - match data: - case [?, ?]: - print("Some pair") - print(?) # Error! - -With `?` taking over the role of the non-binding syntactically significant -wildcard marker, `_` reverts to working the same way it does in other assignment -contexts: it operates as an ordinary identifier and hence becomes a normal -capture pattern rather than a special case. +Wildcard patterns remain the same in the proposed surface syntax, but are +explicitly distinguished from the use of ``_`` as an identifier in the Abstract +Syntax Tree produced by the parser. Constraint patterns @@ -172,59 +172,54 @@ Constraint patterns Constraint patterns use the following simplified syntax:: constraint_pattern: id_constraint | eq_constraint - id_constraint: '?' 'is' primary - eq_constraint: '?' primary + eq_constraint: '==' primary + id_constraint: 'is' primary The constraint expression is an arbitrary primary expression - it can be a simple name, a dotted name lookup, a literal, a function call, or any other primary expression. -While the compiler would allow whitespace between ``?`` and ``is`` in -identity constraints (as they're defined as separate tokens), this PEP -proposes that PEP 8 be updated to recommend writing them like ``?is``, as if -they were a combined unary operator. - If this PEP were to be adopted in preference to PEP 634, then all literal and -value patterns would instead be written as constraint patterns:: +value patterns could instead be written more explicitly as constraint patterns:: # Literal patterns match number: - case ?0: + case == 0: print("Nothing") - case ?1: + case == 1: print("Just one") - case ?2: + case == 2: print("A couple") - case ?-1: + case == (-1): print("One less than nothing") - case ?(1-1j): + case == (1-1j): print("Good luck with that...") # Additional literal patterns match value: - case ?True: + case == True: print("True or 1") - case ?False: + case == False: print("False or 0") - case ?None: + case == None: print("None") - case ?"Hello": + case == "Hello": print("Text 'Hello'") - case ?b"World!": + case == b"World!": print("Binary 'World!'") - case ?...: + case == ...: print("May be useful when writing __getitem__ methods?") # Matching by identity rather than equality SENTINEL = object() match value: - case ?is True: + case is True: print("True, not 1") - case ?is False: + case is False: print("False, not 0") - case ?is None: + case is None: print("None, following PEP 8 comparison guidelines") - case ?is SENTINEL: + case is SENTINEL: print("Matches the sentinel by identity, not just value") # Constant value patterns @@ -236,17 +231,17 @@ value patterns would instead be written as constraint patterns:: preferred_side = Sides.EGGS match entree[-1]: - case ?Sides.SPAM: # Compares entree[-1] == Sides.SPAM. + case == Sides.SPAM: # Compares entree[-1] == Sides.SPAM. response = "Have you got anything without Spam?" - case ?preferred_side: # Compares entree[-1] == preferred_side + case == preferred_side: # Compares entree[-1] == preferred_side response = f"Oh, I love {preferred_side}!" case side: # Assigns side = entree[-1]. response = f"Well, could I have their Spam instead of the {side} then?" -Note the `?preferred_side` example: using an explicit prefix marker on constraint -expressions removes the restriction to only working with bound names for value -lookups. The `?(1-1j)` example illustrates the use of parentheses to turn any -subexpression into an atomic one. +Note the ``== preferred_side`` example: using an explicit prefix marker on +constraint expressions removes the restriction to only working with attributes +or literals for value lookups. The ``== (-1)`` and ``== (1-1j)`` examples +illustrate the use of parentheses to turn any subexpression into an atomic one. This PEP retains the caching property specified for value patterns in PEP 634: if a particular constraint pattern occurs more than once in a given match @@ -256,11 +251,74 @@ clauses. (This implicit caching is less necessary in this PEP, given that explicit local variable caching becomes a valid option, but it still seems a useful property to preserve) + +Inferred constraint patterns +---------------------------- + +Inferred constraint patterns use the syntax proposed for literal and value +patterns in PEP 634, but arrange them differently in the proposed grammar to +allow for a straightforward transformation by the parser into explicit +constraints in the AST output:: + + inferred_constraint_pattern: + | inferred_id_constraint # Emits same parser output as id_constraint + | inferred_eq_constraint # Emits same parser output as eq_constraint + + inferred_id_constraint: + | 'None' + | '...' + + inferred_eq_constraint: + | attr_constraint + | numeric_constraint + | strings + + attr_constraint: attr !('.' | '(' | '=') + attr: name_or_attr '.' NAME + name_or_attr: attr | NAME + + numeric_constraint: + | signed_number !('+' | '-') + | signed_number '+' NUMBER + | signed_number '-' NUMBER + signed_number: NUMBER | '-' NUMBER + +The terminology changes slightly to refer to them as a kind of constraint +rather than as a kind of pattern, clearly separating the subelements inside +patterns into "patterns", which define structures and name binding targets to +match against, and "constraints", which look up existing values to compare +against. + +In practice, the key differences between this PEP's inferred constraint patterns +and PEP 634's value patterns and literal patterns are that + +* inferred constraint patterns won't actually exist in the AST definition. + Instead, they'll be replaced by an explicit constraint node, exactly as if + they had been written with the explicit ``==`` or ``is`` prefix +* ``None`` and ``...`` are handled as part of a separate grammar rule, rather + than needing to be handled as a special case of literal patterns in the parser +* equality constraints are inferred for f-strings in addition to being inferred + for string literals +* inferred constraints for ``True`` and ``False`` are dropped entirely on + grounds of ambiguity +* Numeric constraints don't enforce the restriction that they be limited to + complex literals (only that they be limited to single numbers, or the + addition or subtraction of two such numbers) + +Note: even with inferred constraints handled entirely at the parser level, it +would still be possible to limit the inference of equality constraints to +complex numbers if the tokeniser was amended to emit a different token type +(e.g. ``INUMBER``) for imaginary numbers. The PEP doesn't currently propose +making that change (in line with its generally permissive approach), but it +could be amended to do so if desired. + + Mapping patterns ---------------- -Mapping patterns inherit the change to replace literal patterns and constant -value patterns with constraint patterns:: +Mapping patterns inherit the change to replace literal patterns and +value patterns with constraint patterns that allow arbitrary primary +expressions:: mapping_pattern: '{' [items_pattern] '}' items_pattern: ','.key_value_pattern+ ','? @@ -269,8 +327,8 @@ value patterns with constraint patterns:: | '**' capture_pattern However, the constraint marker prefix is not needed in this case, as the fact -this is a key to be looked up rather than a name to be bound is already -implied by its position within a mapping pattern. +this is a key to be looked up rather than a name to be bound can already be +inferred from its position within a mapping pattern. This means that in simple cases, mapping patterns look exactly as they do in PEP 634:: @@ -296,8 +354,28 @@ to match mapping keys:: process_address(address, port) Note: as complex literals are written as binary operations that are evaluated -at compile time, this PEP requires that they be written in parentheses when -used as a key in a mapping pattern. +at compile time, this PEP nominally requires that they be written in parentheses +when used as a key in a mapping pattern. This requirement could be relaxed to +match PEP 634's handling of complex numbers by also accepting +``numeric_constraint`` as defining a valid key expression, and this is how +the draft reference implementation currently works (so the affected PEP 634 +test cases will compile and run as expected). + + +Wildcard patterns +----------------- + +Wildcard patterns retain the same ``_`` syntax in this PEP as they have in PEP +634. However, this PEP explicitly requires that they be represented in the +Abstract Syntax Tree as something *other than* a regular ``Name`` node. + +The draft reference implementation uses the node name ``SkippedBinding`` to +indicate that the node appears where a simple name binding would ordinarily +occur to indicate that nothing should actually be bound, but the exact name of +the node is more an implementation decision than a design one. The key design +requirement is to limit the special casing of ``_`` to the parser and allow the +rest of the compiler to distinguish wildcard patterns from capture patterns +based entirely on information contained within the node itself. Design Discussion @@ -322,14 +400,14 @@ In particular, being able to easily deconstruct mappings into local variables seems likely to be generally useful, even when there's only one mapping variant to be matched:: - {"host": host, "port": port, "mode": ?"TCP"} = settings + {"host": host, "port": port, "mode": =="TCP"} = settings While such code could already be written using a match statement (assuming either this PEP or PEP 634 were to be accepted into the language), an assignment statement level variant should be able to provide standardised exceptions for cases where the right hand side either wasn't a mapping (throwing -`TypeError`), didn't have the specified keys (throwing `KeyError`), or didn't -have the specific values for the given keys (throwing `ValueError`), avoiding +``TypeError``), didn't have the specified keys (throwing ``KeyError``), or didn't +have the specific values for the given keys (throwing ``ValueError``), avoiding the need to write out that exception raising logic in every case. PEP 635 raises the concern that enough aspects of pattern matching semantics @@ -360,10 +438,16 @@ equality constraints instead and nobody is going to get confused by that". Interaction with caching of attribute lookups in local variables ---------------------------------------------------------------- -The major change between this PEP and PEP 634 is the use of `?EXPR` for value -constraint lookups, rather than ``NAME.ATTR``. The main motivation for this is -to avoid the semantic conflict with regular assignment targets, where -``NAME.ATTR`` is already used in assignment statements to set attributes. +The major change between this PEP and PEP 634 is to offer ``== EXPR`` for value +constraint lookups, rather than only offering ``NAME.ATTR``. The main motivation +for this is to avoid the semantic conflict with regular assignment targets, where +``NAME.ATTR`` is already used in assignment statements to set attributes, so if +``NAME.ATTR`` were the *only* syntax for symbolic value matching, then +we're pre-emptively ruling out any future attempts to allow matching against +single patterns using the existing assignment statement syntax. We'd also be +failing to provide users with suitable scaffolding to help build correct mental +models of what the shorthand forms mean in match patterns (as compared to what +they mean in assignment targets). However, even within match statements themselves, the ``name.attr`` syntax for value patterns has an undesirable interaction with local variable assignment, @@ -389,24 +473,22 @@ variable instead of an attribute lookup, with the following two statements being functionally equivalent:: match expr: - case {"key": ?self.target}: + case {"key": == self.target}: ... # Handle the case where 'expr["key"] == self.target' - case ?: + case _: ... # Handle the non-matching case _target = self.target match expr: - case {"key": ?_target}: + case {"key": == _target}: ... # Handle the case where 'expr["key"] == self.target' - case ?: + case _: ... # Handle the non-matching case -By contrast, PEP 634's attribution of additional semantic significance to the -use of attribute lookup notation means that the following two statements -wouldn't be equivalent at all:: +By contrast, when using the syntactic shorthand that omits the marker prefix, +the following two statements wouldn't be equivalent at all:: - - # PEP 634's value pattern syntax + # PEP 634's value pattern syntax / this PEP's attribute constraint syntax match expr: case {"key": self.target}: ... # Handle the case where 'expr["key"] == self.target' @@ -420,17 +502,15 @@ wouldn't be equivalent at all:: case _: ... # Handle the non-matching case -To be completely clear, the latter statement means the same under this PEP as it -does under PEP 634. The difference is that PEP 634 is relying entirely on the -dotted attribute lookup syntax to identify value patterns, so when the attribute -lookup gets removed, the pattern type immediately changes from a value pattern -to a capture pattern. +This PEP offers a straightforward way to retain the original semantics under +this style of simplistic refactoring: use ``== _target`` to force interpretation +of the result as a constraint pattern instead of a capture pattern (i.e. drop +the no longer applicable syntactic shorthand, and switch to the explicit form). -By contrast, the explicit marker prefix on constraint patterns in this PEP means -that switching from a dotted lookup to a local variable lookup has no effect on -the kind of pattern that the compiler detects - to change it to a capture -pattern, you have to explicitly remove the marker prefix (which will result in -a syntax error if the binding target isn't a simple name). +PEP 634's proposal to offer only the shorthand syntax, with no explicitly +prefixed form, means that the primary answer on offer is "Well, don't do that, +then, only compare against attributes in namespaces, don't compare against +simple names". PEP 622's walrus pattern syntax had another odd interaction where it might not bind the same object as the exact same walrus expression in the body of the @@ -440,170 +520,246 @@ might not be the same value as returned by the LHS is a standard feature common to all uses of the "as" keyword). -Using "?" as the constraint pattern prefix ------------------------------------------- +Using existing comparison operators as the constraint pattern prefix +-------------------------------------------------------------------- If the need for a dedicated constraint pattern prefix is accepted, then the next question is to ask exactly what that prefix should be. -With multiple constraint patterns potentially appearing inside larger -structural patterns, using a single punctuation character rather than a keyword -is desirable for brevity. +The initially published version of this PEP proposed using the previously +unused ``?`` symbol as the prefix for equality constraints, and ``?is`` as the +prefix for identity constraints. When reviewing the PEP, Steven D'Aprano +presented a compelling counterproposal [5_] to use the existing comparison +operators (``==`` and ``is``) instead. -Most potential candidates are already used in Python for another unrelated -purpose, or would integrate poorly with other aspects of the pattern matching -syntax (e.g. ``=`` or ``==`` have multiple problems along those lines, in particular -in the way they would combine with ``=`` as a keyword separator in class -patterns, or ``:`` as a key/value separate in mapping patterns). +There were a few concerns with ``==`` as a prefix that kept it from being +chosen as the prefix in the initial iteration of the PEP: -This PEP proposes ``?`` as the prefix marker as it isn't currently used in Python's -core syntax, the proposed usage as a prefix marker won't conflict with its -use in other Python related contexts (e.g. looking up object help information in -IPython), and there are plausible mnemonics that may help users to *remember* -what the syntax means even if they can't guess the semantics if exposed to it -without any explanation (mostly that it's a shorthand for the question "Is the -unpacked value at this position equivalent to the value given by the expression? -If not, don't match")). +* for common use cases, it's even more visually noisy than ``?``, as a lot of + folks with PEP 8 trained aesthetic sensibilities are going to want to put + a space between it and the following expression, effectively making it a 3 + character prefix instead of 1 +* when used in a class pattern, there needs to be a space between the ``=`` + keyword separator and the ``==`` prefix, or the tokeniser will split them + up incorrectly (getting ``==`` and ``=`` instead of ``=`` and ``==``) +* when used in a mapping pattern, there needs to be a space between the ``:`` + key/value separator and the ``==`` prefix, or the tokeniser will split them + up incorrectly (getting ``:=`` and ``=`` instead of ``:`` and ``==``) -PEP 635 has a good discussion of the problems with this choice in the context -of using it as the wildcard pattern marker: +Rather than introducing a completely new symbol, Steven's proposed resolution to +this verbosity problem was to retain the ability to omit the prefix marker in +syntactically unambiguous cases. - An alternative that does not suggest an arbitrary number of items would - be ``?``. This is even being proposed independently from pattern matching in - PEP 640. We feel however that using ``?`` as a special "assignment" target is - likely more confusing to Python users than using ``_``. It violates Python's - (admittedly vague) principle of using punctuation characters only in ways - similar to how they are used in common English usage or in high school math, - unless the usage is very well established in other programming languages - (like, e.g., using a dot for member access). +This prompted a review of the PEP's goals and underlying concerns, and the +determination that the author's core concern was with the idea of not even +*offering* users the ability to be explicit when they wanted or needed to be, +and instead telling them they could only express the intent that the compiler +inferred that they wanted - they couldn't be more explicit and override the +compiler's default inference when it turned out to be wrong (as it inevitably +will be in at least some cases). - The question mark fails on both counts: its use in other programming - languages is a grab-bag of usages only vaguely suggested by the idea of a - "question". For example, it means "any character" in shell globbing, - "maybe" in regular expressions, "conditional expression" in C and many - C-derived languages, "predicate function" in Scheme, - "modify error handling" in Rust, "optional argument" and "optional chaining" - in TypeScript (the latter meaning has also been proposed for Python by - PEP 505). An as yet unnamed PEP proposes it to mark optional types, - e.g. int?. - - Another common use of ``?`` in programming systems is "help", for example, in - IPython and Jupyter Notebooks and many interactive command-line utilities. - -This PEP takes the view that *not* requiring a marker prefix on value lookups -in match patterns results in a cure that is worse than the disease: Python's -first ever syntax-sensitive value lookup where you can't transparently -replace an attribute lookup with a local variable lookup and maintain semantic -equivalence aside from the exact relative timing of the attribute lookup. - -Assuming the requirement for a marker prefix is accepted on those grounds, then -the syntactic bar to meet isn't "Can users *guess* what the chosen symbol means -without anyone ever explaining it to them?" but instead the lower standard -applied when choosing the ``@`` symbol for both decorator expressions and matrix -multiplication and the ``:=`` character combination for assignment expressions: -"Can users *remember* what it means once they've had it explained to them at -least once?". - -This PEP contends that ``?`` will be able to pass that lower standard, and would -pass it even more readily if PEP 640 were also subsequently adopted to allow it -as a general purpose non-binding wildcard marker that doesn't conflict with the -use of ``_`` in application internationalisation use cases. - -PEPs proposing additional meanings for this character would need to take the -pattern matching meaning into account, but wouldn't necessarily fail purely on -that account (e.g. ``@`` was adopted as a binary operator for matrix -multiplication well after its original adoption as a decorator expression -prefix). "Value checking" related use cases such as PEP 505's None-aware -operators would likely fare especially well on that front, but each such -proposal would continue to be judged on a case-by-case basis. +Given that perspective, PEP 635's arguments against using ``?`` as part of the +pattern matching syntax held for this proposal as well, and so the PEP was +amended accordingly. -Using ``?`` as the wildcard pattern ------------------------------------ +Retaining ``_`` as the wildcard pattern marker +---------------------------------------------- PEP 635 makes a solid case that introducing ``?`` *solely* as a wildcard pattern -marker would be a bad idea. Continuing on from the text already quoted in the -previous section: +marker would be a bad idea. With the syntax for constraint patterns now changed +to use existing comparison operations rather than ``?`` and ``?is``, that +argument holds for this PEP as well. - In addition, this would put Python in a rather unique position: The - underscore is used as a wildcard pattern in every programming language - with pattern matching that we could find (including C#, Elixir, Erlang, - F#, Grace, Haskell, Mathematica, OCaml, Ruby, Rust, Scala, Swift, and - Thorn). Keeping in mind that many users of Python also work with other - programming languages, have prior experience when learning Python, and - may move on to other languages after having learned Python, we find that - such well-established standards are important and relevant with respect - to readability and learnability. In our view, concerns that this wildcard - means that a regular name received special treatment are not strong enough - to introduce syntax that would make Python special. +However, this PEP also proposes adopting an implementation technique that limits +the scope of the associated special casing of ``_`` to the parser: defining a +new AST node type (``SkippedBinding``) specifically for wildcard markers. -Other languages with pattern matching don't use ``?`` as the wildcard pattern -(they all use ``_``), and without any other usage in Python's syntax, there -wouldn't be any useful prompts to help users remember what ``?`` means when -they encounter it in a match pattern. +Within the parser, ``_`` would still mean either a regular name or a wildcard +marker in a match pattern depending on where you were in the parse tree, but +within the rest of the compiler, ``Name("_")`` would always be a regular name, +while ``SkippedBinding()`` would always be a wildcard marker (with it being +the responsibility of the AST validator to disallow the use of +``SkippedBinding`` outside match patterns). -In this PEP, the adoption of ``?`` as the wildcard pattern marker instead comes -from asking the question "What does it mean to omit the constraint expression -from a constraint pattern?", and concluding that "match any value" is a more -useful definition in most situations than reporting a syntax error. - -That said, one possible modification to consider in the name of making code and -concepts easier to share with other languages would be to exempt ``_`` from the -"no repeated names" compiler check. - -With that change, using ``_`` as a wildcard marker would *work* - it would just -also bind the ``_`` name, the same as it does in any other Python assignment -context. +It may also make sense to consider a future proposal that further changes ``_`` +to also skip binding when it's used as part of an iterable unpacking target, but +that's entirely out of the scope of the pattern matching discussion (and would +require careful review of how the routine uses of assignment to ``_`` in +internationalisation use cases and Python interactive prompt implementations +are handled). -No special casing for ``?None``, ``?True``, and ``?False`` ----------------------------------------------------------- +Keeping inferred equality constraints +------------------------------------- -This PEP follows PEP 622 in treating ``None``, ``True`` and ``False`` like any other -value constraint, and comparing them by equality, rather than following PEP -634 in proposing that these literals (and only these literals) be handled specially -and compared via identity. +An early (not widely publicised) draft of this proposal considered keeping +PEP 634's literal patterns, as they don't inherently conflict with assignment +statement syntax the way that PEP 634's value patterns do (trying to assign +to a literal is already a syntax error, whereas assigning to a dotted name +sets the attribute). -While writing ``x is None`` is a common (and PEP 8 recommended) practice, nobody -litters their ``if``-``elif`` chains with ``x is True`` or ``x is False`` expressions, -they write ``x`` and ``not x``, both of which compare by value, not identity. -Indeed, PEP 8 explicitly disallows the use ``if x is True:`` and ``if x is False:``, -preferring the forms without any comparison operator at all. +They were removed in the initially published version due to the fact that they +have the same syntax sensitivity problem as attribute constraints do, where +naively attempting to move the literal pattern out to a local variable for +naming clarity turns the value checking literal pattern into a name binding +capture pattern:: -The key problem with special casing is that it doesn't interact properly with -Python's historical practice where "a reference is just a reference, it doesn't -matter how it is spelled in the code". - -Instead, with the special casing proposed in PEP 634, checking against one of -these values directly would behave differently from checking against it when -saved in a variable or attribute:: - - # PEP 634's literal pattern syntax + # PEP 634's literal pattern syntax / this PEP's literal constraint syntax match expr: - case True: - ... # Only handles the case where "expr is True" - - # PEP 634's value pattern syntax - match expr: - case self.expected_match: # Set to 'True' somewhere else - ... # Handles the case where "expr == True" - -By contrast, the explicit prefix syntax proposed in this PEP makes it -straightforward to include both equality constraints and identity constraints, -allowing users to specify directly in their case clauses whether they want to -match by identity or by value. - -This distinction means that case clauses can even be used to provide a dedicated -code path for exact identity matches on arbitrary objects:: - - match value: - case ?is obj: - ... # Handle being given the exact same object - case ?obj: - ... # Handle being given an equivalent object - case ?: + case {"port": 443}: + ... # Handle the case where 'expr["port"] == 443' + case _: ... # Handle the non-matching case + HTTPS_PORT = 443 + match expr: + case {"port": HTTPS_PORT}: + ... # Matches any mapping with "port", binding its value to HTTPS_PORT + case _: + ... # Handle the non-matching case + +With explicit equality constraints, this style of refactoring keeps the original +semantics (just as it would for a value lookup in any other statement):: + + # This PEP's equality constraints + match expr: + case {"port": == 443}: + ... # Handle the case where 'expr["port"] == 443' + case _: + ... # Handle the non-matching case + + HTTPS_PORT = 443 + match expr: + case {"port": == HTTPS_PORT}: + ... # Handle the case where 'expr["port"] == 443' + case _: + ... # Handle the non-matching case + +As noted above, both literal patterns and value patterns made their return (in +the form of inferred equality constraints) as a way to address the verbosity +problem of offering explicit ``==`` prefixed equality constraints as the *only* +way to express equality checks. + +However, the presence of the explicit constraint nodes in the AST means that +these special cases can be limited to the parser, with the implicit forms +emitting the same AST nodes as their explicit counterparts. + + +Inferring equality constraints for f-strings +-------------------------------------------- + +This is less a design decision in its own right, and more a consequence of +other design decisions: + +* the tokeniser and parser don't distinquish f-strings from other kinds of + strings, so inferring an explicit equality constraint for f-strings happens + by default when defining the match pattern parser rule for string literals +* the rest of the compiler then treats that output like any other explicit + equality constraint in an AST pattern node (i.e. allowing arbitary + expressions) + +This combination of factors makes it awkward to implement a special case that +disallows inferring equality constraints for f-strings while accepting them for +string literals, so the PEP instead opts to just allow them (as they're just as +syntactically unambiguous as any other string in a match pattern). + + +Keeping inferred identity constraints +------------------------------------- + +PEP 635 makes a reasonable case that interpreting a check against ``None`` +as ``== None`` would almost always be incorrect, whereas interpreting it as +``is None`` (as advised in PEP 8) would almost always be what the user intended. + +Similar reasoning applies to checking against ``...``. + +Accordingly, this PEP defines the use of either of these tokens as implying an +identity constraint. + +However, as with inferred equality contraints, inferred identity constraints +become explicit identity constraints in the parser output. + + +Disallowing inferred constraints for ``True`` and ``False`` +----------------------------------------------------------- + +PEP 635 makes a reasonable case that comparing the ``True``, and ``False`` +literals by equality by default is problematic. PEP 8 advises against writing +those comparisons out explicitly in code, so it doesn't make sense for us to +implement a construct that does so implicitly inside the interpreter. + +Unlike PEP 635, however, this PEP proposes to resolve the discrepancy by leaving +these two names out of the initial iteration of the inferred constraint syntax +definition entirely, rather than treating them as implying an identity constraint. + +This means comparisons against ``True`` and ``False`` in match patterns would +need to be written in one of the following forms: + +* comparison by numeric value:: + + case 0: + ... + case 1: + ... + +* comparison by equality (equivalent to comparison by numeric value):: + + case == False: + ... + case == True: + ... + +* comparison by identity:: + + case is False: + ... + case is True: + ... + +* comparison by value with class check (equivalent to comparison by identity):: + + case bool(False): + ... + case bool(True): + ... + +* comparison by boolean coercion:: + + case (x, p) if not p: + ... + case (x, p) if p: + ... + +The last approach is the one that would most closely follow PEP 8's guidance +for ``if``-``elif`` chains (comparing by boolean coercion), but it's far from +clear at this point how ``True`` and ``False`` literals will end up being used +in pattern matching use cases. + +In particular, PEP 635's assessment that users will *probably* mean "comparison +by value with class check", which effectively becomes "comparison by identity" +due to ``True`` and ``False`` being singletons, is a genuinely plausible +suggestion. + +However, rather than attempting to guess up front, this PEP proposes that no +shorthand form be offered for these two constants in the initial implementation, +and we instead wait and see if a clearly preferred meaning emerges from actual +usage of the new construct. + + +Inferred constraints rather than implied constraints +---------------------------------------------------- + +This PEP uses the term "inferred contraint" to make it clear that the parser +is making assumptions about the user's intent when converting an inferred +constraint to an explicit one. + +Calling them "implied constraints" instead would also be reasonable, but that +phrasing has a slightly stronger connotation that the inference is always going +to be correct, and one of the motivations of this PEP is that the inference +*isn't* always going to be correct, so we should be offering a way for users to +be explicit when the parser's assumptions don't align with their intent. + Deferred Ideas ============== @@ -612,62 +768,55 @@ Allowing negated constraints in match patterns ---------------------------------------------- The requirement that constraint expressions be primary expressions means that -it isn't permitted to write ``?not expr`` or ``?is not expr``. +it isn't permitted to write ``!= expr`` or ``is not expr``. -Both of these forms have reasonably clear potential interpretions as a -negated equality constraint (i.e. ``x != expr``) and a negated identity -constraint (i.e. ``x is not expr``). +Both of these forms have clear potential interpretions as a negated equality +constraint (i.e. ``x != expr``) and a negated identity constraint +(i.e. ``x is not expr``). However, it's far from clear either form would come up often enough to justify the dedicated syntax, so the extension has been deferred pending further community experience with match statements. -Note: the compiler can't enforce the primary expression restriction when asked -to compile an AST tree directly, as parentheses used purely for grouping are -lost in the AST generation process. This means the permitted ``?(not expr)`` -generates the same AST as the syntactically disallowed ``?not expr`` would. -That isn't a problem though, as in the hypothetical future where this feature -was implemented, ``?not expr`` wouldn't generate the same AST as ``?(not expr)``, -it would generate a new AST node that indicated the use of a negated eqaulity -constraint pattern. - Allowing containment checks in match patterns --------------------------------------------- -The syntax used for identity constraints would be straightforward to extend to -containment checks: ``?in container``. +The syntax used for equality and identity constraints would be straightforward +to extend to containment checks: ``in container``. -One downside of the proposal in this PEP relative to PEP 634 is that checking -against multiple possible values becomes noticably more verbose, especially -for literal value checks:: +One downside of the proposals in both this PEP and PEP 634 is that checking +for multiple values in the same case is quite verbose:: - # PEP 634 literal pattern + # PEP 634's literal patterns / this PEP's inferred constraints match value: case 0 | 1 | 2 | 3: ... - # This PEP's equality constraints +Explicit equality constraints are even worse:: + match value: - case ?0 | ?1 | ?2 | ?3: + case == one | == two | == three | == four: ... Containment constraints would provide a more concise way to check if the match subject was present in a container:: match value: - case ?in {0, 1, 2, 3}: + case in {0, 1, 2, 3}: ... - case ?in range(4): # It would accept any container, not just literal sets + case in {one, two, three, four}: + ... + case in range(4): # It would accept any container, not just literal sets ... Such a feature would also be readily extensible to allow all kinds of case clauses without any further syntax updates, simply by defining ``__contains__`` appropriately on a custom class definition. -However, while this does seem like a useful extension, it isn't essential, so -it seems more appropriate to defer it to a separate proposal, rather than -including it here. +However, while this does seem like a useful extension, it isn't essential to +making match statements a valuable addition to the language, so it seems more +appropriate to defer it to a separate proposal, rather than including it here. Rejected Ideas @@ -707,21 +856,21 @@ statement:: class mt: value = func() match expr: - case (?, mt.value): + case (_, mt.value): ... # Handle the case where 'expr[1] == func()' Or else they need to be written as a combination of a capture pattern and a guard expression:: match expr: - case (?, _matched) if _matched == func(): + case (_, _matched) if _matched == func(): ... # Handle the case where 'expr[1] == func()' This PEP proposes skipping requiring any such workarounds, and instead supporting arbitrary value constraints from the start:: match expr: - case ?func(): + case (_, == func()): ... # Handle the case where 'expr == func()' Whether actually writing that kind of code is a good idea would be a topic for @@ -749,56 +898,11 @@ permitted by the compiler. This permissiveness comes with a real concrete benefit on the implementation side: dozens of lines of match statement specific code in the compiler is -replaced by simple calls to the existing code for compiling expressions. This -implementation benefit would accrue not just to CPython, but to every other -Python implementation looking to add match statement support. - - -Keeping literal patterns ------------------------- - -An early (not widely publicised) draft of this proposal considered keeping -PEP 634's literal patterns, as they don't inherently conflict with assignment -statement syntax the way that PEP 634's value patterns do (trying to assign -to a literal is already a syntax error, whereas assigning to a dotted name -sets the attribute). - -They were subsequently removed (replaced by the combination of equality and -identity constraints) due to the fact that they have the same syntax -sensitivity problem as value patterns do, where attempting to move the -literal pattern out to a local variable for naming clarity would turn the -value checking literal pattern into a name binding capture pattern:: - - # PEP 634's literal pattern syntax - match expr: - case {"port": 443}: - ... # Handle the case where 'expr["port"] == 443' - case _: - ... # Handle the non-matching case - - HTTPS_PORT = 443 - match expr: - case {"port": HTTPS_PORT}: - ... # Matches any mapping with "port", binding its value to HTTPS_PORT - case _: - ... # Handle the non-matching case - -With equality constraints, this style of refactoring keeps the original -semantics (just as it would for a value lookup in any other statement):: - - # This PEP's equality constraints - match expr: - case {"port": ?443}: - ... # Handle the case where 'expr["port"] == 443' - case _: - ... # Handle the non-matching case - - HTTPS_PORT = 443 - match expr: - case {"port": ?HTTPS_PORT}: - ... # Handle the case where 'expr["port"] == 443' - case _: - ... # Handle the non-matching case +replaced by simple calls to the existing code for compiling expressions +(including in the AST validation pass, the AST optimization pass, the symbol +table analysis pass, and the code generation pass). This implementation +benefit would accrue not just to CPython, but to every other Python +implementation looking to add match statement support. Requiring the use of constraint prefix markers for mapping pattern keys @@ -816,7 +920,7 @@ literal or value patterns:: case {?constants.DEFAULT_PORT: sub_config, **rest}: process_config(sub_config, rest) -However, the extra character is syntactically noisy and unlike its use in +However, the extra character was syntactically noisy and unlike its use in constraint patterns (where it distinguishes them from capture patterns), the prefix doesn't provide any additional information here that isn't already conveyed by the expression's position as a key within a mapping pattern. @@ -839,8 +943,14 @@ PEP 634's ``BASE.ATTR as NAME``. This idea was dropped as it complicated the grammar for no gain in expressiveness over just using the general purpose approach to combining -capture patterns with other match patterns (i.e. ``?EXPR as NAME``) when the -identity of the matching object is important. +capture patterns with other match patterns (i.e. ``?EXPR as NAME`` at the +time, ``== EXPR as NAME`` now) when the identity of the matching object is +important. + +This idea is even less appropriate after the switch to using existing comparison +operators as the marker prefix, as both ``NAME == EXPR`` and ``NAME is EXPR`` +would look like ordinary comparison operations, with nothing to suggest that +``NAME`` is being bound by the pattern matching process. Reference Implementation @@ -850,24 +960,23 @@ A reference implementation for this PEP [3_] has been derived from Brandt Bucher's reference implementation for PEP 634 [4_]. Relative to the text of this PEP, the draft reference implementation currently -retains literal patterns mostly as implemented for PEP 634, except that the -special casing of ``None``, ``True``, and ``False`` has been removed (with -``PEP 642 TODO`` notes added to the code that can be deleted once these patterns -are dropped entirely). +implements the variant of mapping patterns where numeric constraints are +accepted in addition to primary expressions (this allowed the PEP 634 mapping +pattern checks for complex keys to run as written). -Value patterns, wildcard patterns, and mapping patterns have been updated -to follow this PEP rather than PEP 634. +All other modified patterns have been updated to follow this PEP rather than +PEP 634. -Removing literal patterns will be a matter of deleting the code out of the -compiler, and then adding either ``?`` or ``?is`` as necessary to the test -cases that no longer compile. This removal isn't necessary to show that the -PEP's syntax proposal is feasible, so that work has been deferred for now. +The AST validator for match patterns has not yet been implemented. -There will also be an implementation decision to be made around representing +There is an implementation decision still to be made around representing constraint operators in the AST. The draft implementation adds them as new -cases on the existing ``UnaryOp`` node, but it would potentially be better to -implement them as a new ``Constraint`` node, since they're accepted at -different points in the syntax tree than other unary operators. +cases on the existing ``UnaryOp`` node, but there's an argument to be made that +they would be better implemented as a new ``Constraint`` node, since they're +accepted at different points in the syntax tree than other unary operators. +Making them a new node type would also allow an attribute to be added that +marked them as implicit or explicit nodes, which ``ast.unparse`` could use +to make the unparsed code look more like original. Acknowledgments @@ -875,11 +984,20 @@ Acknowledgments The PEP 622 and PEP 634/635/636 authors, as the proposal in this PEP is merely an attempt to improve the readability of an already well-constructed idea by -proposing that one of the key new concepts in that proposal (the ability to -express value constraints in a name binding target) is sufficiently notable -to be worthy of using up one of the few remaining unused ASCII punctuation -characters in Python's syntax instead of reusing the existing attribute binding -syntax to mean an attribute lookup. +proposing that reusing the existing attribute binding syntax to mean an +attribute lookup will be more easily understood as syntactic sugar for a more +explicit underlying expression that's compatible with the existing binding +target syntax than it will be as the *only* way to spell such comparisons in +match patterns. + +Steven D'Aprano, who made a convincing case that the key goals of this PEP could +be achieved by using existing comparison tokens to add the ability to override +the compiler when our guesses as to "what most users will want most of the time" +are inevitably incorrect for at least some users some of the time, and retaining +some of PEP 634's syntactic sugar (with a slightly different semantic definition) +to obtain the same level of brevity as PEP 634 in most situations. (Paul +Sokolosvsky also independently suggested using ``==`` instead of ``?`` as a +more easily understood prefix for equality constraints). References @@ -897,6 +1015,9 @@ References .. [4] PEP 634 reference implementation https://github.com/python/cpython/pull/22917 +.. [5] Steven D'Aprano's cogent criticism of the first published iteration of this PEP + https://mail.python.org/archives/list/python-dev@python.org/message/BTHFWG6MWLHALOD6CHTUFPHAR65YN6BP/ + .. _Appendix A: @@ -928,20 +1049,45 @@ Notation used beyond standard EBNF is as per PEP 534: or_pattern: '|'.closed_pattern+ closed_pattern: | capture_pattern - | constraint_pattern | wildcard_pattern + | constraint_pattern + | inferred_constraint_pattern | group_pattern | sequence_pattern | mapping_pattern | class_pattern - capture_pattern: NAME !('.' | '(' | '=') + capture_pattern: !"_" NAME !('.' | '(' | '=') - constraint_pattern: eq_constraint | id_constraint - id_constraint: '?' 'is' primary - eq_constraint: '?' primary + wildcard_pattern: "_" - wildcard_pattern: '?' + constraint_pattern: + | eq_constraint + | id_constraint + eq_constraint: '==' primary + id_constraint: 'is' primary + + inferred_constraint_pattern: + | inferred_id_constraint + | inferred_eq_constraint + + inferred_id_constraint[expr_ty]: + | 'None' + | '...' + + inferred_eq_constraint: + | attr_constraint + | numeric_constraint + | strings + + attr_constraint: attr !('.' | '(' | '=') + attr: name_or_attr '.' NAME + name_or_attr: attr | NAME + numeric_constraint: + | signed_number !('+' | '-') + | signed_number '+' NUMBER + | signed_number '-' NUMBER + signed_number: NUMBER | '-' NUMBER group_pattern: '(' pattern ')' @@ -962,8 +1108,6 @@ Notation used beyond standard EBNF is as per PEP 534: class_pattern: | name_or_attr '(' [pattern_arguments ','?] ')' - attr: name_or_attr '.' NAME - name_or_attr: attr | NAME pattern_arguments: | positional_patterns [',' keyword_patterns] | keyword_patterns