PEP 635: many improvements (#1663)

* PEP 635: Tweaks markup

Consistently Capitalize Headings.
Remove extra blank lines (two is enough).
Add a few TODOs.
Fix a few typos.

* Went over much of PEP 635 with a fine comb

I got as far as capture patterns.

* Tweak wildcard patterns (adding '?'); muse on 'else'

* Reviewed up to and including sequence patterns

* Checkpoint -- got halfway through Class Patterns

* Changed Walrus to AS and added rationales (Tobias)

* Fix AS-pattern example

Co-authored-by: Tobias Kohn <webmaster@tobiaskohn.ch>
This commit is contained in:
Guido van Rossum 2020-10-19 15:29:17 -07:00 committed by GitHub
parent 0181d5c214
commit a4502e04d6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 292 additions and 148 deletions

View File

@ -15,7 +15,6 @@ Post-History:
Resolution: Resolution:
Abstract Abstract
======== ========
@ -31,7 +30,6 @@ TODO: Go over the feedback from the SC and make sure everything's
somehow addressed. somehow addressed.
Motivation Motivation
========== ==========
@ -88,7 +86,7 @@ We believe that adding pattern matching to Python will enable Python
users to write cleaner, more readable code for examples like those users to write cleaner, more readable code for examples like those
above, and many others. above, and many others.
Pattern matching and OO Pattern Matching and OO
----------------------- -----------------------
Pattern matching is complimentary to the object-oriented paradigm. Pattern matching is complimentary to the object-oriented paradigm.
@ -111,21 +109,31 @@ Like the Visitor pattern, pattern matching allows for a strict separation
of concerns: specific actions or data processing is independent of the of concerns: specific actions or data processing is independent of the
class hierarchy or manipulated objects. When dealing with predefined or class hierarchy or manipulated objects. When dealing with predefined or
even built-in classes, in particular, it is often impossible to add further even built-in classes, in particular, it is often impossible to add further
methods to the individual classes. Pattern matching not only releaves the methods to the individual classes. Pattern matching not only relieves the
programmer or class designer from the burden of the boilerplate code needed programmer or class designer from the burden of the boilerplate code needed
for the Visitor pattern, but is also flexible enough to directly work with for the Visitor pattern, but is also flexible enough to directly work with
built-in types. It naturally distinguishes between sequences of different built-in types. It naturally distinguishes between sequences of different
lengths, who might all share the same class despite obviously differing lengths, which might all share the same class despite obviously differing
structures. Moreover, pattern matching automatically takes inheritance structures. Moreover, pattern matching automatically takes inheritance
into account: a class *D* inheriting from *C* will be handled by a pattern into account: a class *D* inheriting from *C* will be handled by a pattern
that targets *C* by default. that targets *C* by default.
Object oriented programming is geared towards single-dispatch: it is a
single instance (or the type thereof) that determines which method is to
be called. This leads to a somewhat artifical situation in case of binary
operators where both objects might play an equal role in deciding which
implementation to use (Python addresses this through the use of reversed
binary methods). Pattern matching is structurally better suited to handle
such situations of multi-dispatch, where the action to be taken depends on
the types of several objects to equal parts.
TODO: Could we say more here? TODO: Could we say more here?
Pattern and functional style
----------------------------
Most Python applications and libraries are not written in a consistent Patterns and Functional Style
-----------------------------
Many Python applications and libraries are not written in a consistent
OO style -- unlike Java, Python encourages defining functions at the OO style -- unlike Java, Python encourages defining functions at the
top-level of a module, and for simple data structures, tuples (or top-level of a module, and for simple data structures, tuples (or
named tuples or lists) and dictionaries are often used exclusively or named tuples or lists) and dictionaries are often used exclusively or
@ -146,33 +154,51 @@ programming style.
Rationale Rationale
========= =========
TBD. This section provides the rationale for individual design decisions.
This section should provide the rationale for individual design decisions.
It takes the place of "Rejected ideas" in the standard PEP format. It takes the place of "Rejected ideas" in the standard PEP format.
It is organized in sections corresponding to the specification (PEP 634). It is organized in sections corresponding to the specification (PEP 634).
TODO: Cross-check against PEP 622 as well as (private) SC feedback.
Overview and terminology
Overview and Terminology
------------------------ ------------------------
TODO: What to put here?
Much of the power of pattern matching comes from the nesting of subpatterns.
That the success of a pattern match depends directly on the success of
subpattern is thus a cornerstone of the design. However, although a
pattern like ``P(Q(), R())`` succeeds only if both subpatterns ``Q()``
and ``R()`` succeed (i.e. the success of pattern ``P`` depends on ``Q``
and ``R``), the pattern ``P`` is checked first. If ``P`` fails, neither
``Q()`` nor ``R()`` will be tried (this is a direct consequence of the
fact that if ``P`` fails, there are no subjects to match against ``Q()``
and ``R()`` in the first place).
Also note that patterns bind names to values rather than performing an
assignment. This reflects the fact that patterns aim to not have side
effects, which also means that Capture or AS patterns cannot assign a
value to an attribute or subscript. We thus consistently use the term
'bind' instead of 'assign' to emphasise this subtle difference between
traditional assignments and name binding in patterns.
The ``match`` statement The Match Statement
----------------------- -------------------
The match statement evaluates an expression to produce a subject, finds the The match statement evaluates an expression to produce a subject, finds the
first pattern that matches the subject and executes the associated block first pattern that matches the subject, and executes the associated block
of code. Syntactically, the match statement thus takes an expression and of code. Syntactically, the match statement thus takes an expression and
a sequence of case clauses, where each case clause comprises a pattern and a sequence of case clauses, where each case clause comprises a pattern and
a block of code. a block of code.
Since case clauses comprise a block of code, they adhere to the existing Since case clauses comprise a block of code, they adhere to the existing
indentation scheme with the syntactic structure of indentation scheme with the syntactic structure of
``<keyword> ...: <(indented) block>``, which in turn makes it a (compound) ``<keyword> ...: <(indented) block>``, which resembles a compound
statement. The chosen keyword ``case`` reflects its widespread use in statement. The keyword ``case`` reflects its widespread use in
pattern matching languages, ignoring those languages that use other pattern matching languages, ignoring those languages that use other
syntactic means such as a symbol like ``|`` because it would not fit syntactic means such as a symbol like ``|``, because it would not fit
established Python structures. The syntax of patterns following the established Python structures. The syntax of patterns following the
keyword is discussed below. keyword is discussed below.
@ -203,7 +229,7 @@ rules:
... ...
This may look awkward to the eye of a Python programmer, because This may look awkward to the eye of a Python programmer, because
everywhere else colon is followed by an indent. The ``match`` would everywhere else a colon is followed by an indent. The ``match`` would
neither follow the syntactic scheme of simple nor composite statements neither follow the syntactic scheme of simple nor composite statements
but rather establish a category of its own. but rather establish a category of its own.
@ -229,9 +255,9 @@ Although flat indentation would save some horizontal space, the cost of
increased complexity or unusual rules is too high. It would also complicate increased complexity or unusual rules is too high. It would also complicate
life for simple-minded code editors. Finally, the horizontal space issue can life for simple-minded code editors. Finally, the horizontal space issue can
be alleviated by allowing "half-indent" (i.e. two spaces instead of four) be alleviated by allowing "half-indent" (i.e. two spaces instead of four)
for match statements. for match statements (though we do not recommend this).
In sample programs using match, written as part of the development of this In sample programs using ``match``, written as part of the development of this
PEP, a noticeable improvement in code brevity is observed, more than making PEP, a noticeable improvement in code brevity is observed, more than making
up for the additional indentation level. up for the additional indentation level.
@ -239,7 +265,7 @@ up for the additional indentation level.
*Statement vs. Expression.* Some suggestions centered around the idea of *Statement vs. Expression.* Some suggestions centered around the idea of
making ``match`` an expression rather than a statement. However, this making ``match`` an expression rather than a statement. However, this
would fit poorly with Python's statement-oriented nature and lead to would fit poorly with Python's statement-oriented nature and lead to
unusually long and complex expressions with the need to invent new unusually long and complex expressions and the need to invent new
syntactic constructs or break well established syntactic rules. An syntactic constructs or break well established syntactic rules. An
obvious consequence of ``match`` as an expression would be that case obvious consequence of ``match`` as an expression would be that case
clauses could no longer have abitrary blocks of code attached, but only clauses could no longer have abitrary blocks of code attached, but only
@ -247,8 +273,46 @@ a single expression. Overall, the strong limitations could in no way
offset the slight simplification in some special use cases. offset the slight simplification in some special use cases.
*Hard vs. Soft Keyword.* There were options to make match a hard keyword,
or choose a different keyword. Although using a hard keyword would simplify
life for simple-minded syntax highlighters, we decided not to use hard
keyword for several reasons:
Match semantics - Most importantly, the new parser doesn't require us to do this. Unlike
with ``async`` that caused hardships with being a soft keyword for few
releases, here we can make ``match`` a permanent soft keyword.
- ``match`` is so commonly used in existing code, that it would break
almost every existing program and will put a burden to fix code on many
people who may not even benefit from the new syntax.
- It is hard to find an alternative keyword that would not be commonly used
in existing programs as an identifier, and would still clearly reflect the
meaning of the statement.
**Use "as" or "|" instead of "case" for case clauses.**
The pattern matching proposed here is a combination of multi-branch control
flow (in line with ``switch`` in Algol-derived languages or ``cond`` in Lisp)
and object-deconstruction as found in functional languages. While the proposed
keyword ``case`` highlights the multi-branch aspect, alternative keywords such
as ``as`` would equally be possible, highlighting the deconstruction aspect.
``as`` or ``with``, for instance, also have the advantage of already being
keywords in Python. However, since ``case`` as a keyword can only occur as a
leading keyword inside a ``match`` statement, it is easy for a parser to
distinguish between its use as a keyword or as a variable.
Other variants would use a symbol like ``|`` or ``=>``, or go entirely without
special marker.
Since Python is a statement-oriented language in the tradition of Algol, and as
each composite statement starts with an identifying keyword, ``case`` seemed to
be most in line with Python's style and traditions.
Match Semantics
~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~
The patterns of different case clauses might overlap in that more than The patterns of different case clauses might overlap in that more than
@ -290,8 +354,8 @@ unintuitive and surprising behavior.
A direct consequence of this is that any variable bindings outlive the A direct consequence of this is that any variable bindings outlive the
respective case or match statements. Even patterns that only match a respective case or match statements. Even patterns that only match a
subject partially might bind local variables (this is, in fact, necessary subject partially might bind local variables (this is, in fact, necessary
for guards to function properly). However, this escaping of variable for guards to function properly). However, these semantics for variable
bindings is in line with existing Python structures such as for loops and binding are in line with existing Python structures such as for loops and
with statements. with statements.
@ -301,9 +365,9 @@ Guards
Some constraints cannot be adequately expressed through patterns alone. Some constraints cannot be adequately expressed through patterns alone.
For instance, a 'less' or 'greater than' relationship defies the usual For instance, a 'less' or 'greater than' relationship defies the usual
'equal' semantics of patterns. Moreover, different subpatterns are 'equal' semantics of patterns. Moreover, different subpatterns are
independent and cannot refer to each other. The addition of _guards_ independent and cannot refer to each other. The addition of *guards*
addresses these restrictions: a guard is an arbitrary expression attached addresses these restrictions: a guard is an arbitrary expression attached
to a pattern and that must evaluate to ``True`` for the pattern to succeed. to a pattern and that must evaluate to a "truthy" value for the pattern to succeed.
For example, ``case [x, y] if x < y:`` uses a guard (``if x < y``) to For example, ``case [x, y] if x < y:`` uses a guard (``if x < y``) to
express a 'less than' relationship between two otherwise disjoint capture express a 'less than' relationship between two otherwise disjoint capture
@ -312,15 +376,15 @@ patterns ``x`` and ``y``.
From a conceptual point of view, patterns describe structural constraints From a conceptual point of view, patterns describe structural constraints
on the subject in a declarative style, ideally without any side-effects. on the subject in a declarative style, ideally without any side-effects.
Recall, in particular, that patterns are clearly distinct from expressions, Recall, in particular, that patterns are clearly distinct from expressions,
following different objectives and semantics. Guards then enhance the following different objectives and semantics. Guards then enhance case
patterns in a highly controlled way with arbitrary expressions (that might blocks in a highly controlled way with arbitrary expressions (that might
have side effects). Splitting the overal pattern into a static structural have side effects). Splitting the overall functionality into a static structural
and a dynamic 'evaluative' part not only helps with readability, but can and a dynamically evaluated part not only helps with readability, but can
also introduce dramatic potential for compiler optimizations. To keep this also introduce dramatic potential for compiler optimizations. To keep this
clear separation, guards are only supported on the level of case clauses clear separation, guards are only supported on the level of case clauses
and not for individual patterns. and not for individual patterns.
Example using guards:: **Example** using guards::
def sort(seq): def sort(seq):
match seq: match seq:
@ -354,64 +418,84 @@ seen as a prototype to pattern matching in Python, there is only one
Full pattern matching differs from this in that there is more variety Full pattern matching differs from this in that there is more variety
in structual patterns but only a minimum of binding patterns. in structual patterns but only a minimum of binding patterns.
Patterns differ from assignment targets (as in iterable unpacking) in that Patterns differ from assignment targets (as in iterable unpacking) in two ways:
they impose additional constraints on the structure of the subject and in they impose additional constraints on the structure of the subject, and
that a subject might safely fail to match a specific pattern at any point a subject may safely fail to match a specific pattern at any point
(in iterable unpacking, this constitutes an error). The latter means that (in iterable unpacking, this constitutes an error). The latter means that
pattern should avoid side effects wherever possible, including binding pattern should avoid side effects wherever possible.
values to attributes or subscripts.
This desire to avoid side effects is one reason why capture patterns
don't allow binding values to attributes or subscripts: if the
containing pattern were to fail in a later step, it would be hard to
revert such bindings.
A cornerstone of pattern matching is the possibility of arbitrarily A cornerstone of pattern matching is the possibility of arbitrarily
*nesting patterns*. The nesting allows for expressing deep *nesting patterns*. The nesting allows expressing deep
tree structures (for an example of nested class patterns, see the motivation tree structures (for an example of nested class patterns, see the motivation
section above) as well as alternatives. section above) as well as alternatives.
Although the structural patterns might superficially look like expressions, Although patterns might superficially look like expressions,
it is important to keep in mind that there is a clear distinction. In fact, it is important to keep in mind that there is a clear distinction. In fact,
no pattern is or contains an expression. It is more productive to think of no pattern is or contains an expression. It is more productive to think of
patterns as declarative elements similar to the formal parameters in a patterns as declarative elements similar to the formal parameters in a
function definition. function definition.
Walrus/AS patterns AS Patterns
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~
Patterns fall into two categories: most patterns impose a (structural) Patterns fall into two categories: most patterns impose a (structural)
constraint that the subject needs to fulfill, whereas the capture pattern constraint that the subject needs to fulfill, whereas the capture pattern
binds the subject to a name without regard for the subject's structure or binds the subject to a name without regard for the subject's structure or
actual value. Consequently, a pattern can either express a constraint or actual value. Consequently, a pattern can either express a constraint or
bind a value, but not both. Walrus/AS patterns fill this gap in that they bind a value, but not both. AS patterns fill this gap in that they
allow the user to specify a general pattern as well as capture the subject allow the user to specify a general pattern as well as capture the subject
in a variable. in a variable.
Typical use cases for the Walrus/AS pattern include OR and Class patterns Typical use cases for the AS pattern include OR and Class patterns
together with a binding name as in, e.g., ``case BinOp(op := '+'|'-', ...):`` together with a binding name as in, e.g., ``case BinOp('+'|'-' as op, ...):``
or ``case [first := int(), second := int()]:``. The latter could be or ``case [int() as first, int() as second]:``. The latter could be
understood as saying that the subject must fulfil two distinct pattern: understood as saying that the subject must fulfil two distinct pattern:
``[first, second]`` as well as ``[int(), int()]``. The Walrus/AS pattern ``[first, second]`` as well as ``[int(), int()]``. The AS pattern
can thus be seen as a special case of an 'and' pattern (see OR patterns can thus be seen as a special case of an 'and' pattern (see OR patterns
below for an additional discussion of 'and' patterns). below for an additional discussion of 'and' patterns).
Example using the Walrus/AS pattern:: In an earlier version, the AS pattern was devised as a 'Walrus pattern',
written as ``case [first:=int(), second:=int()]``. However, using ``as``
offers some advantages over ``:=``:
- The walrus operator ``:=`` is used to capture the result of an expression
on the right hand side, whereas ``as`` generally indicates some form of
'processing' as in ``import foo as bar`` or ``except E as err:``. Indeed,
the pattern ``P as x`` does not assign the pattern ``P`` to ``x``, but
rather the subject that successfully matches ``P``.
- ``as`` allows for a more consistent data flow from left to right (the
attributes in Class patterns also follow a left-to-right data flow).
- The walrus operator is very close attributes in the Class pattern,
potentially leading to some confusion.
**Example** using the AS pattern::
def simplify_expr(tokens): def simplify_expr(tokens):
match tokens: match tokens:
case [l:=('('|'['), *expr, r:=(')'|']')] if (l+r) in ('()', '[]'): case [('('|'[') as l, *expr, (')'|']') as r] if (l+r) in ('()', '[]'):
return simplify_expr(expr) return simplify_expr(expr)
case [0, op:=('+'|'-'), right]: case [0, ('+'|'-') as op, right]:
return UnaryOp(op, right) return UnaryOp(op, right)
case [left:=(int() | float()) | Num(left), '+', right:=(int() | float()) | Num(right)]: case [(int() | float() as left) | Num(left), '+', (int() | float() as right) | Num(right)]:
return Num(left + right) return Num(left + right)
case [value:=(int() | float())] case [(int() | float()) as value]:
return Num(value) return Num(value)
OR patterns OR Patterns
~~~~~~~~~~~ ~~~~~~~~~~~
The OR pattern allows you to combine 'structurally equivalent' alternatives The OR pattern allows you to combine 'structurally equivalent' alternatives
into a new pattern, i.e. several patterns can share a common handler. If any into a new pattern, i.e. several patterns can share a common handler. If any
one of an OR pattern's subpatterns matches the given subject, the entire OR of an OR pattern's subpatterns matches the subject, the entire OR
pattern succeeds. pattern succeeds.
Statically typed languages prohibit the binding of names (capture patterns) Statically typed languages prohibit the binding of names (capture patterns)
@ -422,13 +506,16 @@ must bind the same set of variables so as not to leave potentially undefined
names. With two alternatives ``P | Q``, this means that if *P* binds the names. With two alternatives ``P | Q``, this means that if *P* binds the
variables *u* and *v*, *Q* must bind exactly the same variables *u* and *v*. variables *u* and *v*, *Q* must bind exactly the same variables *u* and *v*.
There was some discussion on whether to use the bar ``|`` or the keyword There was some discussion on whether to use the bar symbol ``|`` or the ``or``
``or`` in order to separate alternatives. The OR pattern does not fully fit keyword to separate alternatives. The OR pattern does not fully fit
the existing semantics and usage of either of these two symbols. However, the existing semantics and usage of either of these two symbols. However,
``|`` is the symbol of choice in all programming languages with support of ``|`` is the symbol of choice in all programming languages with support of
the OR pattern and is even used in that capacity for regular expressions in the OR pattern and is used in that capacity for regular expressions in
Python as well. Moreover, ``|`` is not only used for bitwise OR, but also Python as well. It is also the traditional separator between alternatives
in formal grammars (including Python's).
Moreover, ``|`` is not only used for bitwise OR, but also
for set unions and dict merging (:pep:`584`). for set unions and dict merging (:pep:`584`).
Other alternatives were considered as well, but none of these would allow Other alternatives were considered as well, but none of these would allow
OR-patterns to be nested inside other patterns: OR-patterns to be nested inside other patterns:
@ -468,8 +555,9 @@ OR-patterns to be nested inside other patterns:
print("A corner of the unit square") print("A corner of the unit square")
*AND and NOT patterns.* **AND and NOT Patterns**
This proposal defines an OR-pattern (|) to match one of several alternates;
Since this proposal defines an OR-pattern (``|``) to match one of several alternates,
why not also an AND-pattern (``&``) or even a NOT-pattern (``!``)? why not also an AND-pattern (``&``) or even a NOT-pattern (``!``)?
Especially given that some other languages (``F#`` for example) support Especially given that some other languages (``F#`` for example) support
AND-patterns. AND-patterns.
@ -480,27 +568,28 @@ all attributes and elements mentioned must be present for the match to
succeed. Guard conditions can also support many of the use cases that a succeed. Guard conditions can also support many of the use cases that a
hypothetical 'and' operator would be used for. hypothetical 'and' operator would be used for.
A negation of a match pattern using the operator ``!`` as a prefix would match A negation of a match pattern using the operator ``!`` as a prefix
exactly if the pattern itself does not match. For instance, ``!(3 | 4)`` would match exactly if the pattern itself does not match. For
would match anything except ``3`` or ``4``. However, there is evidence from instance, ``!(3 | 4)`` would match anything except ``3`` or ``4``.
other languages that this is rarely useful and primarily used as double However, there is `evidence from other languages
negation ``!!`` to control variable scopes and prevent variable bindings <https://dl.acm.org/doi/abs/10.1145/2480360.2384582>`_ that this is
(which does not apply to Python). Other use cases are better expressed using rarely useful, and primarily used as double negation ``!!`` to control
guards. variable scopes and prevent variable bindings (which does not apply to
Python). Other use cases are better expressed using guards.
In the end, it was decided that this would make the syntax more complex In the end, it was decided that this would make the syntax more complex
without adding a significant benefit. without adding a significant benefit. It can always be added later.
Example using the OR pattern:: **Example** using the OR pattern::
def simplify(expr): def simplify(expr):
match expr: match expr:
case ('/', 0, 0): case ('/', 0, 0):
return expr return expr
case ('*' | '/', 0, _): case ('*'|'/', 0, _):
return 0 return 0
case ('+' | '-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*' | '/', x, 1): case ('+'|'-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*'|'/', x, 1):
return x return x
return expr return expr
@ -511,8 +600,8 @@ Literal Patterns
~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~
Literal patterns are a convenient way for imposing constraints on the Literal patterns are a convenient way for imposing constraints on the
value of a subject, rather than its type or structure. Literal patterns value of a subject, rather than its type or structure. They also
even allow you to emulate a switch statement using pattern matching. allow you to emulate a switch statement using pattern matching.
Generally, the subject is compared to a literal pattern by means of standard Generally, the subject is compared to a literal pattern by means of standard
equality (``x == y`` in Python syntax). Consequently, the literal patterns equality (``x == y`` in Python syntax). Consequently, the literal patterns
@ -522,7 +611,7 @@ match the same set of objects because ``True == 1`` holds. However, we
believe that many users would be surprised finding that ``case True:`` believe that many users would be surprised finding that ``case True:``
matched the subject ``1.0``, resulting in some subtle bugs and convoluted matched the subject ``1.0``, resulting in some subtle bugs and convoluted
workarounds. We therefore adopted the rule that the three singleton workarounds. We therefore adopted the rule that the three singleton
objects ``None``, ``False`` and ``True`` match by identity (``x is y`` in patterns ``None``, ``False`` and ``True`` match by identity (``x is y`` in
Python syntax) rather than equality. Hence, ``case True:`` will match only Python syntax) rather than equality. Hence, ``case True:`` will match only
``True`` and nothing else. Note that ``case 1:`` would still match ``True``, ``True`` and nothing else. Note that ``case 1:`` would still match ``True``,
though, because the literal pattern ``1`` works by equality and not identity. though, because the literal pattern ``1`` works by equality and not identity.
@ -530,20 +619,22 @@ though, because the literal pattern ``1`` works by equality and not identity.
Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would
match both the integer ``1`` and the floating point number ``1.0``, whereas match both the integer ``1`` and the floating point number ``1.0``, whereas
``case 1:`` would only match the integer ``1`` were eventually dropped in ``case 1:`` would only match the integer ``1`` were eventually dropped in
favor of the simpler and consistent rule based on equality. Moreover, any favor of the simpler and more consistent rule based on equality. Moreover, any
additional checks whether the subject is an instance of ``numbers.Integral`` additional checks whether the subject is an instance of ``numbers.Integral``
would come at a high runtime cost to introduce what would essentially be would come at a high runtime cost to introduce what would essentially be
novel in Python. When needed, the explicit syntax ``case int(1):`` might a novel idea in Python. When needed, the explicit syntax ``case int(1):`` can
be used. be used.
Recall that literal patterns are *not* expressions, but directly denote a Recall that literal patterns are *not* expressions, but directly
specific value or object. From a syntactical point of view, we have to denote a specific value. From a pragmatic point of view, we want to
ensure that negative and complex numbers can equally be used as patterns, allow using negative and even complex values as literal patterns, but
although they are not atomic literal values (i.e. the seeming literal value they are not atomic literals (only unsigned real and imaginary numbers
``-3+4j`` would syntactically be an expression of the form are). E.g., ``-3+4j`` is syntactically an expression of the form
``BinOp(UnaryOp('-', 3), '+', 4j)``, but as expressions are not part of ``BinOp(UnaryOp('-', 3), '+', 4j)``. Since expressions are not part
patterns, we added syntactic support for such complex value literals without of patterns, we had to add explicit syntactic support for such values
having to resort to full expressions). Interpolated *f*-strings, on the without having to resort to full expressions.
Interpolated *f*-strings, on the
other hand, are not literal values, despite their appearance and can other hand, are not literal values, despite their appearance and can
therefore not be used as literal patterns (string concatenation, however, therefore not be used as literal patterns (string concatenation, however,
is supported). is supported).
@ -551,7 +642,27 @@ is supported).
Literal patterns not only occur as patterns in their own right, but also Literal patterns not only occur as patterns in their own right, but also
as keys in *mapping patterns*. as keys in *mapping patterns*.
Example using Literal patterns::
**Range matching patterns.**
This would allow patterns such as ``1...6``. However, there are a host of
ambiguities:
* Is the range open, half-open, or closed? (I.e. is ``6`` included in the
above example or not?)
* Does the range match a single number, or a range object?
* Range matching is often used for character ranges ('a'...'z') but that
won't work in Python since there's no character data type, just strings.
* Range matching can be a significant performance optimization if you can
pre-build a jump table, but that's not generally possible in Python due
to the fact that names can be dynamically rebound.
Rather than creating a special-case syntax for ranges, it was decided
that allowing custom pattern objects (``InRange(0, 6)``) would be more flexible
and less ambiguous; however those ideas have been postponed for the time
being.
**Example** using Literal patterns::
def simplify(expr): def simplify(expr):
match expr: match expr:
@ -579,7 +690,7 @@ Capture Patterns
Capture patterns take on the form of a name that accepts any value and binds Capture patterns take on the form of a name that accepts any value and binds
it to a (local) variable (unless the name is declared as ``nonlocal`` or it to a (local) variable (unless the name is declared as ``nonlocal`` or
``global``). In that sense, a simple capture pattern is basically equivalent ``global``). In that sense, a capture pattern is similar
to a parameter in a function definition (when the function is called, each to a parameter in a function definition (when the function is called, each
parameter binds the respective argument to a local variable in the function's parameter binds the respective argument to a local variable in the function's
scope). scope).
@ -599,7 +710,7 @@ repeated use of names later on.
There were calls to explicitly mark capture patterns and thus identify them There were calls to explicitly mark capture patterns and thus identify them
as binding targets. According to that idea, a capture pattern would be as binding targets. According to that idea, a capture pattern would be
written as, e.g. ``?x``, ``$x`` or ``=x``. The aim of such explicit capture written as, e.g. ``?x``, ``$x`` or ``=x``. The aim of such explicit capture
markers is to let an unmarked name be a constant value pattern (see below). markers is to let an unmarked name be a value pattern (see below).
However, this is based on the misconception that pattern matching was an However, this is based on the misconception that pattern matching was an
extension of *switch* statements, placing the emphasis on fast switching based extension of *switch* statements, placing the emphasis on fast switching based
on (ordinal) values. Such a *switch* statement has indeed been proposed for on (ordinal) values. Such a *switch* statement has indeed been proposed for
@ -611,7 +722,13 @@ betray the objective of the proposed pattern matching syntax and simplify
a secondary use case at the expense of additional syntactic clutter for a secondary use case at the expense of additional syntactic clutter for
core cases. core cases.
Example using Capture patterns:: It has been proposed that capture patterns are not needed at all,
since the equivalent effect can be obtained by combining a AS
pattern with a wildcard pattern (e.g., ``case _ as x`` is equivalent
to ``case x``). However, this would be unpleasantly verbose,
especially given that we expect capture patterns to be very common.
**Example** using Capture patterns::
def average(*args): def average(*args):
match args: match args:
@ -621,8 +738,8 @@ Example using Capture patterns::
return x return x
case []: case []:
return 0 return 0
case x: # captures the entire sequence case a: # captures the entire sequence
return sum(x) / len(x) return sum(a) / len(a)
.. _wildcard_pattern: .. _wildcard_pattern:
@ -660,27 +777,38 @@ of items is omitted::
case [a, ..., z]: ... case [a, ..., z]: ...
case [a, *, z]: ... case [a, *, z]: ...
Both examples look like the would match a sequence of at two or more items, Either example looks like it would match a sequence of two or more
capturing the first and last values. items, capturing the first and last values. While that may be the
ultimate "wildcard", it does not convey the desired semantics.
A single wildcard clause (i.e. ``case _:``) is semantically equivalent to An alternative that does not suggest an arbitrary number of items
an ``else:``. It accepts any subject without binding it to a variable or would be ``?``. However, this would require changes in the tokenizer,
performing any other operation. However, the wildcard pattern is in and it would put Python in a rather unique position:
contrast to ``else`` usable as a subpattern in nested patterns.
Finally note that the underscore is as a wildcard pattern in *every* The underscore is as a wildcard pattern in *every*
programming language with pattern matching that we could find programming language with pattern matching that we could find
(including *C#*, *Elixir*, *Erlang*, *F#*, *Grace*, *Haskell*, (including *C#*, *Elixir*, *Erlang*, *F#*, *Grace*, *Haskell*,
*Mathematica*, *OCaml*, *Ruby*, *Rust*, *Scala*, *Swift*, and *Thorn*). *Mathematica*, *OCaml*, *Ruby*, *Rust*, *Scala*, *Swift*, and *Thorn*).
Keeping in mind that many users of Python also work with other programming Keeping in mind that many users of Python also work with other programming
languages, have prior experience when learning Python, or moving on to languages, have prior experience when learning Python, and may move on to
other languages after having learnt Python, we find that such well other languages after having learned Python, we find that such
established standards are important and relevant with respect to well-established standards are important and relevant with respect to
readability and learnability. In our view, concerns that this wildcard readability and learnability. In our view, concerns that this wildcard
means that a regular name received special treatment are not strong means that a regular name received special treatment are not strong
enough to introduce syntax that would make Python special. enough to introduce syntax that would make Python special.
Example using the Wildcard pattern:: *Else blocks.* A case block without a guard whose pattern is a single
wildcard (i.e., ``case _:``) accepts any subject without binding it to
a variable or performing any other operation. It is thus semantically
equivalent to ``else:``, if it were supported. However, adding such
an else block to the match statement syntax would not remove the need
for the wildcard pattern in other contexts. Another argument against
this is that there would be two plausible indentation levels for an
else block: aligned with ``case`` or aligned with ``match``. The
authors have found it quite contentious which indentation level to
prefer.
**Example** using the Wildcard pattern::
def is_closed(sequence): def is_closed(sequence):
match sequence: match sequence:
@ -692,45 +820,43 @@ Example using the Wildcard pattern::
return False return False
.. _constant_value_pattern: .. _value_pattern:
Value Patterns Value Patterns
~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~
It is good programming style to use named constants for parametric values or It is good programming style to use named constants for parametric values or
to clarify the meaning of particular values. Clearly, it would be desirable to clarify the meaning of particular values. Clearly, it would be preferable
to write ``case (HttpStatus.OK, body):`` rather than to write ``case (HttpStatus.OK, body):`` over
``case (200, body):``, for example. The main issue that arises here is how to ``case (200, body):``, for example. The main issue that arises here is how to
distinguish capture patterns (variables) from constant value patterns. The distinguish capture patterns (variable bindings) from value patterns. The
general discussion surrounding this issue has brought forward a plethora of general discussion surrounding this issue has brought forward a plethora of
options, which we cannot all fully list here. options, which we cannot all fully list here.
Strictly speaking, constant value patterns are not really necessary, but Strictly speaking, value patterns are not really necessary, but
could be implemented using guards, i.e. could be implemented using guards, i.e.
``case (status, body) if status == HttpStatus.OK:``. Nonetheless, the ``case (status, body) if status == HttpStatus.OK:``. Nonetheless, the
convenience of constant value patterns is unquestioned and obvious. convenience of value patterns is unquestioned and obvious.
The observation that constants tend to be written in uppercase letters or The observation that constants tend to be written in uppercase letters or
collected in enumeration-like namespaces suggests possible rules to discern collected in enumeration-like namespaces suggests possible rules to discern
constants syntactically. However, the idea of using upper vs. lower case as constants syntactically. However, the idea of using upper- vs. lowercase as
a marker has been met with scepticism since there is no similar precedence a marker has been met with scepticism since there is no similar precedence
in core Python (although it is common in other languages). We therefore only in core Python (although it is common in other languages). We therefore only
adopted the rule that any dotted name (i.e. attribute access) is to be adopted the rule that any dotted name (i.e., attribute access) is to be
interpreted as a constant value pattern like ``HttpStatus.OK`` interpreted as a value pattern, for example ``HttpStatus.OK``
above. This precludes, in particular, local variables from acting as above. This precludes, in particular, local variables and global
constants. variables defined in the current module from acting as constants.
Global variables can only be directly used as constant when defined in other A proposed rule to use a leading dot (e.g.
modules, although there are workarounds to access the current module as a
namespace as well. A proposed rule to use a leading dot (e.g.
``.CONSTANT``) for that purpose was critisised because it was felt that the ``.CONSTANT``) for that purpose was critisised because it was felt that the
dot would not be a visible-enough marker for that purpose. Partly inspired dot would not be a visible-enough marker for that purpose. Partly inspired
by use cases in other programming languages, a number of different by forms found in other programming languages, a number of different
markers/sigils were proposed (such as ``^CONSTANT``, ``$CONSTANT``, markers/sigils were proposed (such as ``^CONSTANT``, ``$CONSTANT``,
``==CONSTANT``, ``CONSTANT?``, or the word enclosed in backticks), although ``==CONSTANT``, ``CONSTANT?``, or the word enclosed in backticks), although
there was no obvious or natural choice. The current proposal therefore there was no obvious or natural choice. The current proposal therefore
leaves the discussion and possible introduction of such a 'constant' marker leaves the discussion and possible introduction of such a 'constant' marker
for future PEPs. for a future PEP.
Distinguishing the semantics of names based on whether it is a global Distinguishing the semantics of names based on whether it is a global
variable (i.e. the compiler would treat global variables as constants rather variable (i.e. the compiler would treat global variables as constants rather
@ -740,7 +866,7 @@ patterns. Moreover, pattern matching could not be used directly inside a
module's scope because all variables would be global, making capture module's scope because all variables would be global, making capture
patterns impossible. patterns impossible.
Example using the Value pattern:: **Example** using the Value pattern::
def handle_reply(reply): def handle_reply(reply):
match reply: match reply:
@ -782,7 +908,7 @@ iterable.
possible. possible.
- A starred pattern will capture a sub-sequence of arbitrary length, - A starred pattern will capture a sub-sequence of arbitrary length,
mirroring iterable unpacking as well. Only one starred item may be again mirroring iterable unpacking. Only one starred item may be
present in any sequence pattern. In theory, patterns such as ``(*_, 3, *_)`` present in any sequence pattern. In theory, patterns such as ``(*_, 3, *_)``
could be understood as expressing any sequence containing the value ``3``. could be understood as expressing any sequence containing the value ``3``.
In practise, however, this would only work for a very narrow set of use In practise, however, this would only work for a very narrow set of use
@ -790,31 +916,36 @@ iterable.
- The sequence pattern does *not* iterate through an iterable subject. All - The sequence pattern does *not* iterate through an iterable subject. All
elements are accessed through subscripting and slicing, and the subject must elements are accessed through subscripting and slicing, and the subject must
be an instance of ``collections.abc.Sequence`` (including, in particular, be an instance of ``collections.abc.Sequence``. This includes, of course,
lists and tuples, but excluding strings and bytes, as well as sets and lists and tuples, but excludes e.g. sets and dictionaries. While it would
dictionaries). include strings and bytes, we make an exception for these (see below).
A sequence pattern cannot just iterate through any iterable object. The A sequence pattern cannot just iterate through any iterable object. The
consumption of elements from the iteration would have to be undone if the consumption of elements from the iteration would have to be undone if the
overall pattern fails, which is not possible. overall pattern fails, which is not feasible.
Relying on ``len()`` and subscripting and slicing alone does not work to To identify sequences we cannot rely on ``len()`` and subscripting and
identify sequences because sequences share the protocol with more general slicing alone, because sequences share these protocols with mappings
maps (dictionaries) in this regard. It would be surprising if a sequence (e.g. `dict`) in this regard. It would be surprising if a sequence
pattern also matched dictionaries or other custom objects that implement pattern also matched a dictionaries or other objects implementing
the mapping protocol (i.e. ``__getitem__``). The interpreter therefore the mapping protocol (i.e. ``__getitem__``). The interpreter therefore
performs an instance check to ensure that the subject in question really performs an instance check to ensure that the subject in question really
is a sequence (of known type). is a sequence (of known type). (As an optimization of the most common
case, if the subject is exactly a list or a tuple, the instance check
can be skipped.)
String and bytes objects have a dual nature: they are both 'atomic' objects String and bytes objects have a dual nature: they are both 'atomic' objects
in their own right, as well as sequences (with a strongly recursive nature in their own right, as well as sequences (with a strongly recursive nature
in that a string is a sequence of strings). The typical behavior and use in that a string is a sequence of strings). The typical behavior and use
cases for strings and bytes are different enough from that of tuples and cases for strings and bytes are different enough from those of tuples and
lists to warrant a clear distinction. It is in fact often unintuitive and lists to warrant a clear distinction. It is in fact often unintuitive and
unintended that strings pass for sequences as evidenced by regular questions unintended that strings pass for sequences, as evidenced by regular questions
and complaints. Strings and bytes are therefore not matched by a sequence and complaints. Strings and bytes are therefore not matched by a sequence
pattern, limiting the sequence pattern to a very specific understanding of pattern, limiting the sequence pattern to a very specific understanding of
'sequence'. 'sequence'. The built-in ``bytearray`` type, being a mutable version of
``bytes``, also deserves an exception; but we don't intend to
enumerate all other types that may be used to represent bytes
(e.g. some, but not all, instances of ``memoryview`` and ``array.array``).
.. _mapping_pattern: .. _mapping_pattern:
@ -823,9 +954,9 @@ Mapping Patterns
~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~
Dictionaries or mappings in general are one of the most important and most Dictionaries or mappings in general are one of the most important and most
widely used data structures in Python. In contrast to sequences mappings widely used data structures in Python. In contrast to sequences, mappings
are built for fast direct access to arbitrary elements (identified by a key). are built for fast direct access to arbitrary elements identified by a key.
In most use cases an element is retrieved from a dictionary by a known key In most cases an element is retrieved from a dictionary by a known key
without regard for any ordering or other key-value pairs stored in the same without regard for any ordering or other key-value pairs stored in the same
dictionary. Particularly common are string keys. dictionary. Particularly common are string keys.
@ -836,13 +967,13 @@ pattern does not check for the presence of additional keys. Should it be
necessary to impose an upper bound on the mapping and ensure that no necessary to impose an upper bound on the mapping and ensure that no
additional keys are present, then the usual double-star-pattern ``**rest`` additional keys are present, then the usual double-star-pattern ``**rest``
can be used. The special case ``**_`` with a wildcard, however, is not can be used. The special case ``**_`` with a wildcard, however, is not
supported as it would not have any effect, but might lead to a wrong supported as it would not have any effect, but might lead to an incorrect
understanding of the mapping pattern's semantics. understanding of the mapping pattern's semantics.
To avoid overly expensive matching algorithms, keys must be literals or To avoid overly expensive matching algorithms, keys must be literals or
constant values. value patterns.
Example using the Mapping pattern:: **Example** using the Mapping pattern::
def change_red_to_blue(json_obj): def change_red_to_blue(json_obj):
match json_obj: match json_obj:
@ -858,10 +989,10 @@ Example using the Mapping pattern::
Class Patterns Class Patterns
~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~
Class patterns fulfil two purposes: checking whether a given subject is Class patterns fulfill two purposes: checking whether a given subject is
indeed an instance of a specific class and extracting data from specific indeed an instance of a specific class, and extracting data from specific
attributes of the subject. A quick survey revealed that ``isinstance()`` attributes of the subject. Anecdotal evidence revealed that ``isinstance()``
is indeed one of the most often used functions in Python in terms of is one of the most often used functions in Python in terms of
static occurrences in programs. Such instance checks typically precede static occurrences in programs. Such instance checks typically precede
a subsequent access to information stored in the object, or a possible a subsequent access to information stored in the object, or a possible
manipulation thereof. A typical pattern might be along the lines of:: manipulation thereof. A typical pattern might be along the lines of::
@ -873,7 +1004,7 @@ manipulation thereof. A typical pattern might be along the lines of::
elif isinstance(node, Leaf): elif isinstance(node, Leaf):
print(node.value) print(node.value)
In many cases, however, class patterns occur nested as in the example In many cases class patterns occur nested, as in the example
given in the motivation:: given in the motivation::
if (isinstance(node, BinOp) and node.op == "+" if (isinstance(node, BinOp) and node.op == "+"
@ -881,8 +1012,8 @@ given in the motivation::
a, b, c = node.left, node.right.left, node.right.right a, b, c = node.left, node.right.left, node.right.right
# Handle a + b*c # Handle a + b*c
The class pattern lets you concisely specify both an instance-check as The class pattern lets you concisely specify both an instance check
well as relevant attributes (with possible further constraints). It is and relevant attributes (with possible further constraints). It is
thereby very tempting to write, e.g., ``case Node(left, right):`` in the thereby very tempting to write, e.g., ``case Node(left, right):`` in the
first case above and ``case Leaf(value):`` in the second. While this first case above and ``case Leaf(value):`` in the second. While this
indeed works well for languages with strict algebraic data types, it is indeed works well for languages with strict algebraic data types, it is
@ -890,14 +1021,14 @@ problematic with the structure of Python objects.
When dealing with general Python objects, we face a potentially very large When dealing with general Python objects, we face a potentially very large
number of unordered attributes: an instance of ``Node`` contains a large number of unordered attributes: an instance of ``Node`` contains a large
number of attributes (most of which are 'private methods' such as, e.g., number of attributes (most of which are 'special methods' such as
``__repr__``). Moreover, the interpreter cannot reliably deduce which of ``__repr__``). Moreover, the interpreter cannot reliably deduce the
the attributes comes first and which comes second. For an object that ordering of attributes. For an object that
represents a circle, say, there is no inherently obvious ordering of the represents a circle, say, there is no inherently obvious ordering of the
attributes ``x``, ``y`` and ``radius``. attributes ``x``, ``y`` and ``radius``.
We envision two possibilities for dealing with this issue: either explicitly We envision two possibilities for dealing with this issue: either explicitly
name the attributes of interest or provide an additional mapping that tells name the attributes of interest, or provide an additional mapping that tells
the interpreter which attributes to extract and in which order. Both the interpreter which attributes to extract and in which order. Both
approaches are supported. Moreover, explicitly naming the attributes of approaches are supported. Moreover, explicitly naming the attributes of
interest lets you further specify the required structure of an object; if interest lets you further specify the required structure of an object; if
@ -948,6 +1079,20 @@ the explicit construction of instances, where class patterns ``c(p, q)``
deliberately mirror the syntax of creating instances. deliberately mirror the syntax of creating instances.
**Type annotations for pattern variables.**
The proposal was to combine patterns with type annotations::
match x:
case [a: int, b: str]: print(f"An int {a} and a string {b}:)
case [a: int, b: int, c: int]: print(f"Three ints", a, b, c)
...
This idea has a lot of problems. For one, the colon can only
be used inside of brackets or parens, otherwise the syntax becomes
ambiguous. And because Python disallows ``isinstance()`` checks
on generic types, type annotations containing generics will not
work as expected.
History and Context History and Context
=================== ===================
@ -1052,7 +1197,6 @@ Or you would combine these ideas to write ``Node(right=y)`` so as to require
an instance of ``Node`` but only extract the value of the `right` attribute. an instance of ``Node`` but only extract the value of the `right` attribute.
Copyright Copyright
========= =========