From a4502e04d67dc0e10f64c2358d5bc39d35669430 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Mon, 19 Oct 2020 15:29:17 -0700 Subject: [PATCH] PEP 635: many improvements (#1663) * PEP 635: Tweaks markup Consistently Capitalize Headings. Remove extra blank lines (two is enough). Add a few TODOs. Fix a few typos. * Went over much of PEP 635 with a fine comb I got as far as capture patterns. * Tweak wildcard patterns (adding '?'); muse on 'else' * Reviewed up to and including sequence patterns * Checkpoint -- got halfway through Class Patterns * Changed Walrus to AS and added rationales (Tobias) * Fix AS-pattern example Co-authored-by: Tobias Kohn --- pep-0635.rst | 440 ++++++++++++++++++++++++++++++++++----------------- 1 file changed, 292 insertions(+), 148 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index ca744baa2..659c486cb 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -15,7 +15,6 @@ Post-History: Resolution: - Abstract ======== @@ -31,7 +30,6 @@ TODO: Go over the feedback from the SC and make sure everything's somehow addressed. - Motivation ========== @@ -88,7 +86,7 @@ We believe that adding pattern matching to Python will enable Python users to write cleaner, more readable code for examples like those above, and many others. -Pattern matching and OO +Pattern Matching and OO ----------------------- Pattern matching is complimentary to the object-oriented paradigm. @@ -111,21 +109,31 @@ Like the Visitor pattern, pattern matching allows for a strict separation of concerns: specific actions or data processing is independent of the class hierarchy or manipulated objects. When dealing with predefined or even built-in classes, in particular, it is often impossible to add further -methods to the individual classes. Pattern matching not only releaves the +methods to the individual classes. Pattern matching not only relieves the programmer or class designer from the burden of the boilerplate code needed for the Visitor pattern, but is also flexible enough to directly work with built-in types. It naturally distinguishes between sequences of different -lengths, who might all share the same class despite obviously differing +lengths, which might all share the same class despite obviously differing structures. Moreover, pattern matching automatically takes inheritance into account: a class *D* inheriting from *C* will be handled by a pattern that targets *C* by default. +Object oriented programming is geared towards single-dispatch: it is a +single instance (or the type thereof) that determines which method is to +be called. This leads to a somewhat artifical situation in case of binary +operators where both objects might play an equal role in deciding which +implementation to use (Python addresses this through the use of reversed +binary methods). Pattern matching is structurally better suited to handle +such situations of multi-dispatch, where the action to be taken depends on +the types of several objects to equal parts. + TODO: Could we say more here? -Pattern and functional style ----------------------------- -Most Python applications and libraries are not written in a consistent +Patterns and Functional Style +----------------------------- + +Many Python applications and libraries are not written in a consistent OO style -- unlike Java, Python encourages defining functions at the top-level of a module, and for simple data structures, tuples (or named tuples or lists) and dictionaries are often used exclusively or @@ -146,33 +154,51 @@ programming style. Rationale ========= -TBD. - -This section should provide the rationale for individual design decisions. +This section provides the rationale for individual design decisions. It takes the place of "Rejected ideas" in the standard PEP format. It is organized in sections corresponding to the specification (PEP 634). +TODO: Cross-check against PEP 622 as well as (private) SC feedback. -Overview and terminology + +Overview and Terminology ------------------------ +TODO: What to put here? + +Much of the power of pattern matching comes from the nesting of subpatterns. +That the success of a pattern match depends directly on the success of +subpattern is thus a cornerstone of the design. However, although a +pattern like ``P(Q(), R())`` succeeds only if both subpatterns ``Q()`` +and ``R()`` succeed (i.e. the success of pattern ``P`` depends on ``Q`` +and ``R``), the pattern ``P`` is checked first. If ``P`` fails, neither +``Q()`` nor ``R()`` will be tried (this is a direct consequence of the +fact that if ``P`` fails, there are no subjects to match against ``Q()`` +and ``R()`` in the first place). + +Also note that patterns bind names to values rather than performing an +assignment. This reflects the fact that patterns aim to not have side +effects, which also means that Capture or AS patterns cannot assign a +value to an attribute or subscript. We thus consistently use the term +'bind' instead of 'assign' to emphasise this subtle difference between +traditional assignments and name binding in patterns. -The ``match`` statement ------------------------ +The Match Statement +------------------- The match statement evaluates an expression to produce a subject, finds the -first pattern that matches the subject and executes the associated block +first pattern that matches the subject, and executes the associated block of code. Syntactically, the match statement thus takes an expression and a sequence of case clauses, where each case clause comprises a pattern and a block of code. Since case clauses comprise a block of code, they adhere to the existing indentation scheme with the syntactic structure of -`` ...: <(indented) block>``, which in turn makes it a (compound) -statement. The chosen keyword ``case`` reflects its widespread use in +`` ...: <(indented) block>``, which resembles a compound +statement. The keyword ``case`` reflects its widespread use in pattern matching languages, ignoring those languages that use other -syntactic means such as a symbol like ``|`` because it would not fit +syntactic means such as a symbol like ``|``, because it would not fit established Python structures. The syntax of patterns following the keyword is discussed below. @@ -203,7 +229,7 @@ rules: ... This may look awkward to the eye of a Python programmer, because - everywhere else colon is followed by an indent. The ``match`` would + everywhere else a colon is followed by an indent. The ``match`` would neither follow the syntactic scheme of simple nor composite statements but rather establish a category of its own. @@ -229,9 +255,9 @@ Although flat indentation would save some horizontal space, the cost of increased complexity or unusual rules is too high. It would also complicate life for simple-minded code editors. Finally, the horizontal space issue can be alleviated by allowing "half-indent" (i.e. two spaces instead of four) -for match statements. +for match statements (though we do not recommend this). -In sample programs using match, written as part of the development of this +In sample programs using ``match``, written as part of the development of this PEP, a noticeable improvement in code brevity is observed, more than making up for the additional indentation level. @@ -239,7 +265,7 @@ up for the additional indentation level. *Statement vs. Expression.* Some suggestions centered around the idea of making ``match`` an expression rather than a statement. However, this would fit poorly with Python's statement-oriented nature and lead to -unusually long and complex expressions with the need to invent new +unusually long and complex expressions and the need to invent new syntactic constructs or break well established syntactic rules. An obvious consequence of ``match`` as an expression would be that case clauses could no longer have abitrary blocks of code attached, but only @@ -247,8 +273,46 @@ a single expression. Overall, the strong limitations could in no way offset the slight simplification in some special use cases. +*Hard vs. Soft Keyword.* There were options to make match a hard keyword, +or choose a different keyword. Although using a hard keyword would simplify +life for simple-minded syntax highlighters, we decided not to use hard +keyword for several reasons: -Match semantics +- Most importantly, the new parser doesn't require us to do this. Unlike + with ``async`` that caused hardships with being a soft keyword for few + releases, here we can make ``match`` a permanent soft keyword. + +- ``match`` is so commonly used in existing code, that it would break + almost every existing program and will put a burden to fix code on many + people who may not even benefit from the new syntax. + +- It is hard to find an alternative keyword that would not be commonly used + in existing programs as an identifier, and would still clearly reflect the + meaning of the statement. + + +**Use "as" or "|" instead of "case" for case clauses.** +The pattern matching proposed here is a combination of multi-branch control +flow (in line with ``switch`` in Algol-derived languages or ``cond`` in Lisp) +and object-deconstruction as found in functional languages. While the proposed +keyword ``case`` highlights the multi-branch aspect, alternative keywords such +as ``as`` would equally be possible, highlighting the deconstruction aspect. +``as`` or ``with``, for instance, also have the advantage of already being +keywords in Python. However, since ``case`` as a keyword can only occur as a +leading keyword inside a ``match`` statement, it is easy for a parser to +distinguish between its use as a keyword or as a variable. + +Other variants would use a symbol like ``|`` or ``=>``, or go entirely without +special marker. + +Since Python is a statement-oriented language in the tradition of Algol, and as +each composite statement starts with an identifying keyword, ``case`` seemed to +be most in line with Python's style and traditions. + + + + +Match Semantics ~~~~~~~~~~~~~~~ The patterns of different case clauses might overlap in that more than @@ -290,8 +354,8 @@ unintuitive and surprising behavior. A direct consequence of this is that any variable bindings outlive the respective case or match statements. Even patterns that only match a subject partially might bind local variables (this is, in fact, necessary -for guards to function properly). However, this escaping of variable -bindings is in line with existing Python structures such as for loops and +for guards to function properly). However, these semantics for variable +binding are in line with existing Python structures such as for loops and with statements. @@ -301,9 +365,9 @@ Guards Some constraints cannot be adequately expressed through patterns alone. For instance, a 'less' or 'greater than' relationship defies the usual 'equal' semantics of patterns. Moreover, different subpatterns are -independent and cannot refer to each other. The addition of _guards_ +independent and cannot refer to each other. The addition of *guards* addresses these restrictions: a guard is an arbitrary expression attached -to a pattern and that must evaluate to ``True`` for the pattern to succeed. +to a pattern and that must evaluate to a "truthy" value for the pattern to succeed. For example, ``case [x, y] if x < y:`` uses a guard (``if x < y``) to express a 'less than' relationship between two otherwise disjoint capture @@ -312,15 +376,15 @@ patterns ``x`` and ``y``. From a conceptual point of view, patterns describe structural constraints on the subject in a declarative style, ideally without any side-effects. Recall, in particular, that patterns are clearly distinct from expressions, -following different objectives and semantics. Guards then enhance the -patterns in a highly controlled way with arbitrary expressions (that might -have side effects). Splitting the overal pattern into a static structural -and a dynamic 'evaluative' part not only helps with readability, but can +following different objectives and semantics. Guards then enhance case +blocks in a highly controlled way with arbitrary expressions (that might +have side effects). Splitting the overall functionality into a static structural +and a dynamically evaluated part not only helps with readability, but can also introduce dramatic potential for compiler optimizations. To keep this clear separation, guards are only supported on the level of case clauses and not for individual patterns. -Example using guards:: +**Example** using guards:: def sort(seq): match seq: @@ -354,64 +418,84 @@ seen as a prototype to pattern matching in Python, there is only one Full pattern matching differs from this in that there is more variety in structual patterns but only a minimum of binding patterns. -Patterns differ from assignment targets (as in iterable unpacking) in that -they impose additional constraints on the structure of the subject and in -that a subject might safely fail to match a specific pattern at any point +Patterns differ from assignment targets (as in iterable unpacking) in two ways: +they impose additional constraints on the structure of the subject, and +a subject may safely fail to match a specific pattern at any point (in iterable unpacking, this constitutes an error). The latter means that -pattern should avoid side effects wherever possible, including binding -values to attributes or subscripts. +pattern should avoid side effects wherever possible. + +This desire to avoid side effects is one reason why capture patterns +don't allow binding values to attributes or subscripts: if the +containing pattern were to fail in a later step, it would be hard to +revert such bindings. A cornerstone of pattern matching is the possibility of arbitrarily -*nesting patterns*. The nesting allows for expressing deep +*nesting patterns*. The nesting allows expressing deep tree structures (for an example of nested class patterns, see the motivation section above) as well as alternatives. -Although the structural patterns might superficially look like expressions, +Although patterns might superficially look like expressions, it is important to keep in mind that there is a clear distinction. In fact, no pattern is or contains an expression. It is more productive to think of patterns as declarative elements similar to the formal parameters in a function definition. -Walrus/AS patterns -~~~~~~~~~~~~~~~~~~ +AS Patterns +~~~~~~~~~~~ Patterns fall into two categories: most patterns impose a (structural) constraint that the subject needs to fulfill, whereas the capture pattern binds the subject to a name without regard for the subject's structure or actual value. Consequently, a pattern can either express a constraint or -bind a value, but not both. Walrus/AS patterns fill this gap in that they +bind a value, but not both. AS patterns fill this gap in that they allow the user to specify a general pattern as well as capture the subject in a variable. -Typical use cases for the Walrus/AS pattern include OR and Class patterns -together with a binding name as in, e.g., ``case BinOp(op := '+'|'-', ...):`` -or ``case [first := int(), second := int()]:``. The latter could be +Typical use cases for the AS pattern include OR and Class patterns +together with a binding name as in, e.g., ``case BinOp('+'|'-' as op, ...):`` +or ``case [int() as first, int() as second]:``. The latter could be understood as saying that the subject must fulfil two distinct pattern: -``[first, second]`` as well as ``[int(), int()]``. The Walrus/AS pattern +``[first, second]`` as well as ``[int(), int()]``. The AS pattern can thus be seen as a special case of an 'and' pattern (see OR patterns below for an additional discussion of 'and' patterns). -Example using the Walrus/AS pattern:: +In an earlier version, the AS pattern was devised as a 'Walrus pattern', +written as ``case [first:=int(), second:=int()]``. However, using ``as`` +offers some advantages over ``:=``: + +- The walrus operator ``:=`` is used to capture the result of an expression + on the right hand side, whereas ``as`` generally indicates some form of + 'processing' as in ``import foo as bar`` or ``except E as err:``. Indeed, + the pattern ``P as x`` does not assign the pattern ``P`` to ``x``, but + rather the subject that successfully matches ``P``. + +- ``as`` allows for a more consistent data flow from left to right (the + attributes in Class patterns also follow a left-to-right data flow). + +- The walrus operator is very close attributes in the Class pattern, + potentially leading to some confusion. + +**Example** using the AS pattern:: def simplify_expr(tokens): match tokens: - case [l:=('('|'['), *expr, r:=(')'|']')] if (l+r) in ('()', '[]'): + case [('('|'[') as l, *expr, (')'|']') as r] if (l+r) in ('()', '[]'): return simplify_expr(expr) - case [0, op:=('+'|'-'), right]: + case [0, ('+'|'-') as op, right]: return UnaryOp(op, right) - case [left:=(int() | float()) | Num(left), '+', right:=(int() | float()) | Num(right)]: + case [(int() | float() as left) | Num(left), '+', (int() | float() as right) | Num(right)]: return Num(left + right) - case [value:=(int() | float())] + case [(int() | float()) as value]: return Num(value) -OR patterns +OR Patterns ~~~~~~~~~~~ The OR pattern allows you to combine 'structurally equivalent' alternatives into a new pattern, i.e. several patterns can share a common handler. If any -one of an OR pattern's subpatterns matches the given subject, the entire OR +of an OR pattern's subpatterns matches the subject, the entire OR pattern succeeds. Statically typed languages prohibit the binding of names (capture patterns) @@ -422,13 +506,16 @@ must bind the same set of variables so as not to leave potentially undefined names. With two alternatives ``P | Q``, this means that if *P* binds the variables *u* and *v*, *Q* must bind exactly the same variables *u* and *v*. -There was some discussion on whether to use the bar ``|`` or the keyword -``or`` in order to separate alternatives. The OR pattern does not fully fit +There was some discussion on whether to use the bar symbol ``|`` or the ``or`` +keyword to separate alternatives. The OR pattern does not fully fit the existing semantics and usage of either of these two symbols. However, ``|`` is the symbol of choice in all programming languages with support of -the OR pattern and is even used in that capacity for regular expressions in -Python as well. Moreover, ``|`` is not only used for bitwise OR, but also +the OR pattern and is used in that capacity for regular expressions in +Python as well. It is also the traditional separator between alternatives +in formal grammars (including Python's). +Moreover, ``|`` is not only used for bitwise OR, but also for set unions and dict merging (:pep:`584`). + Other alternatives were considered as well, but none of these would allow OR-patterns to be nested inside other patterns: @@ -468,8 +555,9 @@ OR-patterns to be nested inside other patterns: print("A corner of the unit square") -*AND and NOT patterns.* -This proposal defines an OR-pattern (|) to match one of several alternates; +**AND and NOT Patterns** + +Since this proposal defines an OR-pattern (``|``) to match one of several alternates, why not also an AND-pattern (``&``) or even a NOT-pattern (``!``)? Especially given that some other languages (``F#`` for example) support AND-patterns. @@ -480,27 +568,28 @@ all attributes and elements mentioned must be present for the match to succeed. Guard conditions can also support many of the use cases that a hypothetical 'and' operator would be used for. -A negation of a match pattern using the operator ``!`` as a prefix would match -exactly if the pattern itself does not match. For instance, ``!(3 | 4)`` -would match anything except ``3`` or ``4``. However, there is evidence from -other languages that this is rarely useful and primarily used as double -negation ``!!`` to control variable scopes and prevent variable bindings -(which does not apply to Python). Other use cases are better expressed using -guards. +A negation of a match pattern using the operator ``!`` as a prefix +would match exactly if the pattern itself does not match. For +instance, ``!(3 | 4)`` would match anything except ``3`` or ``4``. +However, there is `evidence from other languages +`_ that this is +rarely useful, and primarily used as double negation ``!!`` to control +variable scopes and prevent variable bindings (which does not apply to +Python). Other use cases are better expressed using guards. In the end, it was decided that this would make the syntax more complex -without adding a significant benefit. +without adding a significant benefit. It can always be added later. -Example using the OR pattern:: +**Example** using the OR pattern:: def simplify(expr): match expr: case ('/', 0, 0): return expr - case ('*' | '/', 0, _): + case ('*'|'/', 0, _): return 0 - case ('+' | '-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*' | '/', x, 1): + case ('+'|'-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*'|'/', x, 1): return x return expr @@ -511,8 +600,8 @@ Literal Patterns ~~~~~~~~~~~~~~~~ Literal patterns are a convenient way for imposing constraints on the -value of a subject, rather than its type or structure. Literal patterns -even allow you to emulate a switch statement using pattern matching. +value of a subject, rather than its type or structure. They also +allow you to emulate a switch statement using pattern matching. Generally, the subject is compared to a literal pattern by means of standard equality (``x == y`` in Python syntax). Consequently, the literal patterns @@ -522,7 +611,7 @@ match the same set of objects because ``True == 1`` holds. However, we believe that many users would be surprised finding that ``case True:`` matched the subject ``1.0``, resulting in some subtle bugs and convoluted workarounds. We therefore adopted the rule that the three singleton -objects ``None``, ``False`` and ``True`` match by identity (``x is y`` in +patterns ``None``, ``False`` and ``True`` match by identity (``x is y`` in Python syntax) rather than equality. Hence, ``case True:`` will match only ``True`` and nothing else. Note that ``case 1:`` would still match ``True``, though, because the literal pattern ``1`` works by equality and not identity. @@ -530,20 +619,22 @@ though, because the literal pattern ``1`` works by equality and not identity. Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would match both the integer ``1`` and the floating point number ``1.0``, whereas ``case 1:`` would only match the integer ``1`` were eventually dropped in -favor of the simpler and consistent rule based on equality. Moreover, any +favor of the simpler and more consistent rule based on equality. Moreover, any additional checks whether the subject is an instance of ``numbers.Integral`` would come at a high runtime cost to introduce what would essentially be -novel in Python. When needed, the explicit syntax ``case int(1):`` might +a novel idea in Python. When needed, the explicit syntax ``case int(1):`` can be used. -Recall that literal patterns are *not* expressions, but directly denote a -specific value or object. From a syntactical point of view, we have to -ensure that negative and complex numbers can equally be used as patterns, -although they are not atomic literal values (i.e. the seeming literal value -``-3+4j`` would syntactically be an expression of the form -``BinOp(UnaryOp('-', 3), '+', 4j)``, but as expressions are not part of -patterns, we added syntactic support for such complex value literals without -having to resort to full expressions). Interpolated *f*-strings, on the +Recall that literal patterns are *not* expressions, but directly +denote a specific value. From a pragmatic point of view, we want to +allow using negative and even complex values as literal patterns, but +they are not atomic literals (only unsigned real and imaginary numbers +are). E.g., ``-3+4j`` is syntactically an expression of the form +``BinOp(UnaryOp('-', 3), '+', 4j)``. Since expressions are not part +of patterns, we had to add explicit syntactic support for such values +without having to resort to full expressions. + +Interpolated *f*-strings, on the other hand, are not literal values, despite their appearance and can therefore not be used as literal patterns (string concatenation, however, is supported). @@ -551,7 +642,27 @@ is supported). Literal patterns not only occur as patterns in their own right, but also as keys in *mapping patterns*. -Example using Literal patterns:: + +**Range matching patterns.** +This would allow patterns such as ``1...6``. However, there are a host of +ambiguities: + +* Is the range open, half-open, or closed? (I.e. is ``6`` included in the + above example or not?) +* Does the range match a single number, or a range object? +* Range matching is often used for character ranges ('a'...'z') but that + won't work in Python since there's no character data type, just strings. +* Range matching can be a significant performance optimization if you can + pre-build a jump table, but that's not generally possible in Python due + to the fact that names can be dynamically rebound. + +Rather than creating a special-case syntax for ranges, it was decided +that allowing custom pattern objects (``InRange(0, 6)``) would be more flexible +and less ambiguous; however those ideas have been postponed for the time +being. + + +**Example** using Literal patterns:: def simplify(expr): match expr: @@ -579,7 +690,7 @@ Capture Patterns Capture patterns take on the form of a name that accepts any value and binds it to a (local) variable (unless the name is declared as ``nonlocal`` or -``global``). In that sense, a simple capture pattern is basically equivalent +``global``). In that sense, a capture pattern is similar to a parameter in a function definition (when the function is called, each parameter binds the respective argument to a local variable in the function's scope). @@ -599,7 +710,7 @@ repeated use of names later on. There were calls to explicitly mark capture patterns and thus identify them as binding targets. According to that idea, a capture pattern would be written as, e.g. ``?x``, ``$x`` or ``=x``. The aim of such explicit capture -markers is to let an unmarked name be a constant value pattern (see below). +markers is to let an unmarked name be a value pattern (see below). However, this is based on the misconception that pattern matching was an extension of *switch* statements, placing the emphasis on fast switching based on (ordinal) values. Such a *switch* statement has indeed been proposed for @@ -611,7 +722,13 @@ betray the objective of the proposed pattern matching syntax and simplify a secondary use case at the expense of additional syntactic clutter for core cases. -Example using Capture patterns:: +It has been proposed that capture patterns are not needed at all, +since the equivalent effect can be obtained by combining a AS +pattern with a wildcard pattern (e.g., ``case _ as x`` is equivalent +to ``case x``). However, this would be unpleasantly verbose, +especially given that we expect capture patterns to be very common. + +**Example** using Capture patterns:: def average(*args): match args: @@ -621,8 +738,8 @@ Example using Capture patterns:: return x case []: return 0 - case x: # captures the entire sequence - return sum(x) / len(x) + case a: # captures the entire sequence + return sum(a) / len(a) .. _wildcard_pattern: @@ -660,27 +777,38 @@ of items is omitted:: case [a, ..., z]: ... case [a, *, z]: ... -Both examples look like the would match a sequence of at two or more items, -capturing the first and last values. +Either example looks like it would match a sequence of two or more +items, capturing the first and last values. While that may be the +ultimate "wildcard", it does not convey the desired semantics. -A single wildcard clause (i.e. ``case _:``) is semantically equivalent to -an ``else:``. It accepts any subject without binding it to a variable or -performing any other operation. However, the wildcard pattern is in -contrast to ``else`` usable as a subpattern in nested patterns. +An alternative that does not suggest an arbitrary number of items +would be ``?``. However, this would require changes in the tokenizer, +and it would put Python in a rather unique position: -Finally note that the underscore is as a wildcard pattern in *every* +The underscore is as a wildcard pattern in *every* programming language with pattern matching that we could find (including *C#*, *Elixir*, *Erlang*, *F#*, *Grace*, *Haskell*, *Mathematica*, *OCaml*, *Ruby*, *Rust*, *Scala*, *Swift*, and *Thorn*). Keeping in mind that many users of Python also work with other programming -languages, have prior experience when learning Python, or moving on to -other languages after having learnt Python, we find that such well -established standards are important and relevant with respect to +languages, have prior experience when learning Python, and may move on to +other languages after having learned Python, we find that such +well-established standards are important and relevant with respect to readability and learnability. In our view, concerns that this wildcard means that a regular name received special treatment are not strong enough to introduce syntax that would make Python special. -Example using the Wildcard pattern:: +*Else blocks.* A case block without a guard whose pattern is a single +wildcard (i.e., ``case _:``) accepts any subject without binding it to +a variable or performing any other operation. It is thus semantically +equivalent to ``else:``, if it were supported. However, adding such +an else block to the match statement syntax would not remove the need +for the wildcard pattern in other contexts. Another argument against +this is that there would be two plausible indentation levels for an +else block: aligned with ``case`` or aligned with ``match``. The +authors have found it quite contentious which indentation level to +prefer. + +**Example** using the Wildcard pattern:: def is_closed(sequence): match sequence: @@ -692,45 +820,43 @@ Example using the Wildcard pattern:: return False -.. _constant_value_pattern: +.. _value_pattern: Value Patterns ~~~~~~~~~~~~~~ It is good programming style to use named constants for parametric values or -to clarify the meaning of particular values. Clearly, it would be desirable -to write ``case (HttpStatus.OK, body):`` rather than +to clarify the meaning of particular values. Clearly, it would be preferable +to write ``case (HttpStatus.OK, body):`` over ``case (200, body):``, for example. The main issue that arises here is how to -distinguish capture patterns (variables) from constant value patterns. The +distinguish capture patterns (variable bindings) from value patterns. The general discussion surrounding this issue has brought forward a plethora of options, which we cannot all fully list here. -Strictly speaking, constant value patterns are not really necessary, but +Strictly speaking, value patterns are not really necessary, but could be implemented using guards, i.e. ``case (status, body) if status == HttpStatus.OK:``. Nonetheless, the -convenience of constant value patterns is unquestioned and obvious. +convenience of value patterns is unquestioned and obvious. The observation that constants tend to be written in uppercase letters or collected in enumeration-like namespaces suggests possible rules to discern -constants syntactically. However, the idea of using upper vs. lower case as +constants syntactically. However, the idea of using upper- vs. lowercase as a marker has been met with scepticism since there is no similar precedence in core Python (although it is common in other languages). We therefore only -adopted the rule that any dotted name (i.e. attribute access) is to be -interpreted as a constant value pattern like ``HttpStatus.OK`` -above. This precludes, in particular, local variables from acting as -constants. +adopted the rule that any dotted name (i.e., attribute access) is to be +interpreted as a value pattern, for example ``HttpStatus.OK`` +above. This precludes, in particular, local variables and global +variables defined in the current module from acting as constants. -Global variables can only be directly used as constant when defined in other -modules, although there are workarounds to access the current module as a -namespace as well. A proposed rule to use a leading dot (e.g. +A proposed rule to use a leading dot (e.g. ``.CONSTANT``) for that purpose was critisised because it was felt that the dot would not be a visible-enough marker for that purpose. Partly inspired -by use cases in other programming languages, a number of different +by forms found in other programming languages, a number of different markers/sigils were proposed (such as ``^CONSTANT``, ``$CONSTANT``, ``==CONSTANT``, ``CONSTANT?``, or the word enclosed in backticks), although there was no obvious or natural choice. The current proposal therefore leaves the discussion and possible introduction of such a 'constant' marker -for future PEPs. +for a future PEP. Distinguishing the semantics of names based on whether it is a global variable (i.e. the compiler would treat global variables as constants rather @@ -740,7 +866,7 @@ patterns. Moreover, pattern matching could not be used directly inside a module's scope because all variables would be global, making capture patterns impossible. -Example using the Value pattern:: +**Example** using the Value pattern:: def handle_reply(reply): match reply: @@ -782,7 +908,7 @@ iterable. possible. - A starred pattern will capture a sub-sequence of arbitrary length, - mirroring iterable unpacking as well. Only one starred item may be + again mirroring iterable unpacking. Only one starred item may be present in any sequence pattern. In theory, patterns such as ``(*_, 3, *_)`` could be understood as expressing any sequence containing the value ``3``. In practise, however, this would only work for a very narrow set of use @@ -790,31 +916,36 @@ iterable. - The sequence pattern does *not* iterate through an iterable subject. All elements are accessed through subscripting and slicing, and the subject must - be an instance of ``collections.abc.Sequence`` (including, in particular, - lists and tuples, but excluding strings and bytes, as well as sets and - dictionaries). + be an instance of ``collections.abc.Sequence``. This includes, of course, + lists and tuples, but excludes e.g. sets and dictionaries. While it would + include strings and bytes, we make an exception for these (see below). A sequence pattern cannot just iterate through any iterable object. The consumption of elements from the iteration would have to be undone if the -overall pattern fails, which is not possible. +overall pattern fails, which is not feasible. -Relying on ``len()`` and subscripting and slicing alone does not work to -identify sequences because sequences share the protocol with more general -maps (dictionaries) in this regard. It would be surprising if a sequence -pattern also matched dictionaries or other custom objects that implement +To identify sequences we cannot rely on ``len()`` and subscripting and +slicing alone, because sequences share these protocols with mappings +(e.g. `dict`) in this regard. It would be surprising if a sequence +pattern also matched a dictionaries or other objects implementing the mapping protocol (i.e. ``__getitem__``). The interpreter therefore performs an instance check to ensure that the subject in question really -is a sequence (of known type). +is a sequence (of known type). (As an optimization of the most common +case, if the subject is exactly a list or a tuple, the instance check +can be skipped.) String and bytes objects have a dual nature: they are both 'atomic' objects in their own right, as well as sequences (with a strongly recursive nature in that a string is a sequence of strings). The typical behavior and use -cases for strings and bytes are different enough from that of tuples and +cases for strings and bytes are different enough from those of tuples and lists to warrant a clear distinction. It is in fact often unintuitive and -unintended that strings pass for sequences as evidenced by regular questions +unintended that strings pass for sequences, as evidenced by regular questions and complaints. Strings and bytes are therefore not matched by a sequence pattern, limiting the sequence pattern to a very specific understanding of -'sequence'. +'sequence'. The built-in ``bytearray`` type, being a mutable version of +``bytes``, also deserves an exception; but we don't intend to +enumerate all other types that may be used to represent bytes +(e.g. some, but not all, instances of ``memoryview`` and ``array.array``). .. _mapping_pattern: @@ -823,9 +954,9 @@ Mapping Patterns ~~~~~~~~~~~~~~~~ Dictionaries or mappings in general are one of the most important and most -widely used data structures in Python. In contrast to sequences mappings -are built for fast direct access to arbitrary elements (identified by a key). -In most use cases an element is retrieved from a dictionary by a known key +widely used data structures in Python. In contrast to sequences, mappings +are built for fast direct access to arbitrary elements identified by a key. +In most cases an element is retrieved from a dictionary by a known key without regard for any ordering or other key-value pairs stored in the same dictionary. Particularly common are string keys. @@ -836,13 +967,13 @@ pattern does not check for the presence of additional keys. Should it be necessary to impose an upper bound on the mapping and ensure that no additional keys are present, then the usual double-star-pattern ``**rest`` can be used. The special case ``**_`` with a wildcard, however, is not -supported as it would not have any effect, but might lead to a wrong +supported as it would not have any effect, but might lead to an incorrect understanding of the mapping pattern's semantics. To avoid overly expensive matching algorithms, keys must be literals or -constant values. +value patterns. -Example using the Mapping pattern:: +**Example** using the Mapping pattern:: def change_red_to_blue(json_obj): match json_obj: @@ -858,10 +989,10 @@ Example using the Mapping pattern:: Class Patterns ~~~~~~~~~~~~~~ -Class patterns fulfil two purposes: checking whether a given subject is -indeed an instance of a specific class and extracting data from specific -attributes of the subject. A quick survey revealed that ``isinstance()`` -is indeed one of the most often used functions in Python in terms of +Class patterns fulfill two purposes: checking whether a given subject is +indeed an instance of a specific class, and extracting data from specific +attributes of the subject. Anecdotal evidence revealed that ``isinstance()`` +is one of the most often used functions in Python in terms of static occurrences in programs. Such instance checks typically precede a subsequent access to information stored in the object, or a possible manipulation thereof. A typical pattern might be along the lines of:: @@ -873,7 +1004,7 @@ manipulation thereof. A typical pattern might be along the lines of:: elif isinstance(node, Leaf): print(node.value) -In many cases, however, class patterns occur nested as in the example +In many cases class patterns occur nested, as in the example given in the motivation:: if (isinstance(node, BinOp) and node.op == "+" @@ -881,8 +1012,8 @@ given in the motivation:: a, b, c = node.left, node.right.left, node.right.right # Handle a + b*c -The class pattern lets you concisely specify both an instance-check as -well as relevant attributes (with possible further constraints). It is +The class pattern lets you concisely specify both an instance check +and relevant attributes (with possible further constraints). It is thereby very tempting to write, e.g., ``case Node(left, right):`` in the first case above and ``case Leaf(value):`` in the second. While this indeed works well for languages with strict algebraic data types, it is @@ -890,14 +1021,14 @@ problematic with the structure of Python objects. When dealing with general Python objects, we face a potentially very large number of unordered attributes: an instance of ``Node`` contains a large -number of attributes (most of which are 'private methods' such as, e.g., -``__repr__``). Moreover, the interpreter cannot reliably deduce which of -the attributes comes first and which comes second. For an object that +number of attributes (most of which are 'special methods' such as +``__repr__``). Moreover, the interpreter cannot reliably deduce the +ordering of attributes. For an object that represents a circle, say, there is no inherently obvious ordering of the attributes ``x``, ``y`` and ``radius``. We envision two possibilities for dealing with this issue: either explicitly -name the attributes of interest or provide an additional mapping that tells +name the attributes of interest, or provide an additional mapping that tells the interpreter which attributes to extract and in which order. Both approaches are supported. Moreover, explicitly naming the attributes of interest lets you further specify the required structure of an object; if @@ -948,6 +1079,20 @@ the explicit construction of instances, where class patterns ``c(p, q)`` deliberately mirror the syntax of creating instances. +**Type annotations for pattern variables.** +The proposal was to combine patterns with type annotations:: + + match x: + case [a: int, b: str]: print(f"An int {a} and a string {b}:) + case [a: int, b: int, c: int]: print(f"Three ints", a, b, c) + ... + +This idea has a lot of problems. For one, the colon can only +be used inside of brackets or parens, otherwise the syntax becomes +ambiguous. And because Python disallows ``isinstance()`` checks +on generic types, type annotations containing generics will not +work as expected. + History and Context =================== @@ -1052,7 +1197,6 @@ Or you would combine these ideas to write ``Node(right=y)`` so as to require an instance of ``Node`` but only extract the value of the `right` attribute. - Copyright =========