PEP 635: many improvements (#1663)
* PEP 635: Tweaks markup Consistently Capitalize Headings. Remove extra blank lines (two is enough). Add a few TODOs. Fix a few typos. * Went over much of PEP 635 with a fine comb I got as far as capture patterns. * Tweak wildcard patterns (adding '?'); muse on 'else' * Reviewed up to and including sequence patterns * Checkpoint -- got halfway through Class Patterns * Changed Walrus to AS and added rationales (Tobias) * Fix AS-pattern example Co-authored-by: Tobias Kohn <webmaster@tobiaskohn.ch>
This commit is contained in:
parent
0181d5c214
commit
a4502e04d6
440
pep-0635.rst
440
pep-0635.rst
|
@ -15,7 +15,6 @@ Post-History:
|
|||
Resolution:
|
||||
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
|
@ -31,7 +30,6 @@ TODO: Go over the feedback from the SC and make sure everything's
|
|||
somehow addressed.
|
||||
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
|
@ -88,7 +86,7 @@ We believe that adding pattern matching to Python will enable Python
|
|||
users to write cleaner, more readable code for examples like those
|
||||
above, and many others.
|
||||
|
||||
Pattern matching and OO
|
||||
Pattern Matching and OO
|
||||
-----------------------
|
||||
|
||||
Pattern matching is complimentary to the object-oriented paradigm.
|
||||
|
@ -111,21 +109,31 @@ Like the Visitor pattern, pattern matching allows for a strict separation
|
|||
of concerns: specific actions or data processing is independent of the
|
||||
class hierarchy or manipulated objects. When dealing with predefined or
|
||||
even built-in classes, in particular, it is often impossible to add further
|
||||
methods to the individual classes. Pattern matching not only releaves the
|
||||
methods to the individual classes. Pattern matching not only relieves the
|
||||
programmer or class designer from the burden of the boilerplate code needed
|
||||
for the Visitor pattern, but is also flexible enough to directly work with
|
||||
built-in types. It naturally distinguishes between sequences of different
|
||||
lengths, who might all share the same class despite obviously differing
|
||||
lengths, which might all share the same class despite obviously differing
|
||||
structures. Moreover, pattern matching automatically takes inheritance
|
||||
into account: a class *D* inheriting from *C* will be handled by a pattern
|
||||
that targets *C* by default.
|
||||
|
||||
Object oriented programming is geared towards single-dispatch: it is a
|
||||
single instance (or the type thereof) that determines which method is to
|
||||
be called. This leads to a somewhat artifical situation in case of binary
|
||||
operators where both objects might play an equal role in deciding which
|
||||
implementation to use (Python addresses this through the use of reversed
|
||||
binary methods). Pattern matching is structurally better suited to handle
|
||||
such situations of multi-dispatch, where the action to be taken depends on
|
||||
the types of several objects to equal parts.
|
||||
|
||||
TODO: Could we say more here?
|
||||
|
||||
Pattern and functional style
|
||||
----------------------------
|
||||
|
||||
Most Python applications and libraries are not written in a consistent
|
||||
Patterns and Functional Style
|
||||
-----------------------------
|
||||
|
||||
Many Python applications and libraries are not written in a consistent
|
||||
OO style -- unlike Java, Python encourages defining functions at the
|
||||
top-level of a module, and for simple data structures, tuples (or
|
||||
named tuples or lists) and dictionaries are often used exclusively or
|
||||
|
@ -146,33 +154,51 @@ programming style.
|
|||
Rationale
|
||||
=========
|
||||
|
||||
TBD.
|
||||
|
||||
This section should provide the rationale for individual design decisions.
|
||||
This section provides the rationale for individual design decisions.
|
||||
It takes the place of "Rejected ideas" in the standard PEP format.
|
||||
It is organized in sections corresponding to the specification (PEP 634).
|
||||
|
||||
TODO: Cross-check against PEP 622 as well as (private) SC feedback.
|
||||
|
||||
Overview and terminology
|
||||
|
||||
Overview and Terminology
|
||||
------------------------
|
||||
|
||||
TODO: What to put here?
|
||||
|
||||
Much of the power of pattern matching comes from the nesting of subpatterns.
|
||||
That the success of a pattern match depends directly on the success of
|
||||
subpattern is thus a cornerstone of the design. However, although a
|
||||
pattern like ``P(Q(), R())`` succeeds only if both subpatterns ``Q()``
|
||||
and ``R()`` succeed (i.e. the success of pattern ``P`` depends on ``Q``
|
||||
and ``R``), the pattern ``P`` is checked first. If ``P`` fails, neither
|
||||
``Q()`` nor ``R()`` will be tried (this is a direct consequence of the
|
||||
fact that if ``P`` fails, there are no subjects to match against ``Q()``
|
||||
and ``R()`` in the first place).
|
||||
|
||||
Also note that patterns bind names to values rather than performing an
|
||||
assignment. This reflects the fact that patterns aim to not have side
|
||||
effects, which also means that Capture or AS patterns cannot assign a
|
||||
value to an attribute or subscript. We thus consistently use the term
|
||||
'bind' instead of 'assign' to emphasise this subtle difference between
|
||||
traditional assignments and name binding in patterns.
|
||||
|
||||
|
||||
The ``match`` statement
|
||||
-----------------------
|
||||
The Match Statement
|
||||
-------------------
|
||||
|
||||
The match statement evaluates an expression to produce a subject, finds the
|
||||
first pattern that matches the subject and executes the associated block
|
||||
first pattern that matches the subject, and executes the associated block
|
||||
of code. Syntactically, the match statement thus takes an expression and
|
||||
a sequence of case clauses, where each case clause comprises a pattern and
|
||||
a block of code.
|
||||
|
||||
Since case clauses comprise a block of code, they adhere to the existing
|
||||
indentation scheme with the syntactic structure of
|
||||
``<keyword> ...: <(indented) block>``, which in turn makes it a (compound)
|
||||
statement. The chosen keyword ``case`` reflects its widespread use in
|
||||
``<keyword> ...: <(indented) block>``, which resembles a compound
|
||||
statement. The keyword ``case`` reflects its widespread use in
|
||||
pattern matching languages, ignoring those languages that use other
|
||||
syntactic means such as a symbol like ``|`` because it would not fit
|
||||
syntactic means such as a symbol like ``|``, because it would not fit
|
||||
established Python structures. The syntax of patterns following the
|
||||
keyword is discussed below.
|
||||
|
||||
|
@ -203,7 +229,7 @@ rules:
|
|||
...
|
||||
|
||||
This may look awkward to the eye of a Python programmer, because
|
||||
everywhere else colon is followed by an indent. The ``match`` would
|
||||
everywhere else a colon is followed by an indent. The ``match`` would
|
||||
neither follow the syntactic scheme of simple nor composite statements
|
||||
but rather establish a category of its own.
|
||||
|
||||
|
@ -229,9 +255,9 @@ Although flat indentation would save some horizontal space, the cost of
|
|||
increased complexity or unusual rules is too high. It would also complicate
|
||||
life for simple-minded code editors. Finally, the horizontal space issue can
|
||||
be alleviated by allowing "half-indent" (i.e. two spaces instead of four)
|
||||
for match statements.
|
||||
for match statements (though we do not recommend this).
|
||||
|
||||
In sample programs using match, written as part of the development of this
|
||||
In sample programs using ``match``, written as part of the development of this
|
||||
PEP, a noticeable improvement in code brevity is observed, more than making
|
||||
up for the additional indentation level.
|
||||
|
||||
|
@ -239,7 +265,7 @@ up for the additional indentation level.
|
|||
*Statement vs. Expression.* Some suggestions centered around the idea of
|
||||
making ``match`` an expression rather than a statement. However, this
|
||||
would fit poorly with Python's statement-oriented nature and lead to
|
||||
unusually long and complex expressions with the need to invent new
|
||||
unusually long and complex expressions and the need to invent new
|
||||
syntactic constructs or break well established syntactic rules. An
|
||||
obvious consequence of ``match`` as an expression would be that case
|
||||
clauses could no longer have abitrary blocks of code attached, but only
|
||||
|
@ -247,8 +273,46 @@ a single expression. Overall, the strong limitations could in no way
|
|||
offset the slight simplification in some special use cases.
|
||||
|
||||
|
||||
*Hard vs. Soft Keyword.* There were options to make match a hard keyword,
|
||||
or choose a different keyword. Although using a hard keyword would simplify
|
||||
life for simple-minded syntax highlighters, we decided not to use hard
|
||||
keyword for several reasons:
|
||||
|
||||
Match semantics
|
||||
- Most importantly, the new parser doesn't require us to do this. Unlike
|
||||
with ``async`` that caused hardships with being a soft keyword for few
|
||||
releases, here we can make ``match`` a permanent soft keyword.
|
||||
|
||||
- ``match`` is so commonly used in existing code, that it would break
|
||||
almost every existing program and will put a burden to fix code on many
|
||||
people who may not even benefit from the new syntax.
|
||||
|
||||
- It is hard to find an alternative keyword that would not be commonly used
|
||||
in existing programs as an identifier, and would still clearly reflect the
|
||||
meaning of the statement.
|
||||
|
||||
|
||||
**Use "as" or "|" instead of "case" for case clauses.**
|
||||
The pattern matching proposed here is a combination of multi-branch control
|
||||
flow (in line with ``switch`` in Algol-derived languages or ``cond`` in Lisp)
|
||||
and object-deconstruction as found in functional languages. While the proposed
|
||||
keyword ``case`` highlights the multi-branch aspect, alternative keywords such
|
||||
as ``as`` would equally be possible, highlighting the deconstruction aspect.
|
||||
``as`` or ``with``, for instance, also have the advantage of already being
|
||||
keywords in Python. However, since ``case`` as a keyword can only occur as a
|
||||
leading keyword inside a ``match`` statement, it is easy for a parser to
|
||||
distinguish between its use as a keyword or as a variable.
|
||||
|
||||
Other variants would use a symbol like ``|`` or ``=>``, or go entirely without
|
||||
special marker.
|
||||
|
||||
Since Python is a statement-oriented language in the tradition of Algol, and as
|
||||
each composite statement starts with an identifying keyword, ``case`` seemed to
|
||||
be most in line with Python's style and traditions.
|
||||
|
||||
|
||||
|
||||
|
||||
Match Semantics
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
The patterns of different case clauses might overlap in that more than
|
||||
|
@ -290,8 +354,8 @@ unintuitive and surprising behavior.
|
|||
A direct consequence of this is that any variable bindings outlive the
|
||||
respective case or match statements. Even patterns that only match a
|
||||
subject partially might bind local variables (this is, in fact, necessary
|
||||
for guards to function properly). However, this escaping of variable
|
||||
bindings is in line with existing Python structures such as for loops and
|
||||
for guards to function properly). However, these semantics for variable
|
||||
binding are in line with existing Python structures such as for loops and
|
||||
with statements.
|
||||
|
||||
|
||||
|
@ -301,9 +365,9 @@ Guards
|
|||
Some constraints cannot be adequately expressed through patterns alone.
|
||||
For instance, a 'less' or 'greater than' relationship defies the usual
|
||||
'equal' semantics of patterns. Moreover, different subpatterns are
|
||||
independent and cannot refer to each other. The addition of _guards_
|
||||
independent and cannot refer to each other. The addition of *guards*
|
||||
addresses these restrictions: a guard is an arbitrary expression attached
|
||||
to a pattern and that must evaluate to ``True`` for the pattern to succeed.
|
||||
to a pattern and that must evaluate to a "truthy" value for the pattern to succeed.
|
||||
|
||||
For example, ``case [x, y] if x < y:`` uses a guard (``if x < y``) to
|
||||
express a 'less than' relationship between two otherwise disjoint capture
|
||||
|
@ -312,15 +376,15 @@ patterns ``x`` and ``y``.
|
|||
From a conceptual point of view, patterns describe structural constraints
|
||||
on the subject in a declarative style, ideally without any side-effects.
|
||||
Recall, in particular, that patterns are clearly distinct from expressions,
|
||||
following different objectives and semantics. Guards then enhance the
|
||||
patterns in a highly controlled way with arbitrary expressions (that might
|
||||
have side effects). Splitting the overal pattern into a static structural
|
||||
and a dynamic 'evaluative' part not only helps with readability, but can
|
||||
following different objectives and semantics. Guards then enhance case
|
||||
blocks in a highly controlled way with arbitrary expressions (that might
|
||||
have side effects). Splitting the overall functionality into a static structural
|
||||
and a dynamically evaluated part not only helps with readability, but can
|
||||
also introduce dramatic potential for compiler optimizations. To keep this
|
||||
clear separation, guards are only supported on the level of case clauses
|
||||
and not for individual patterns.
|
||||
|
||||
Example using guards::
|
||||
**Example** using guards::
|
||||
|
||||
def sort(seq):
|
||||
match seq:
|
||||
|
@ -354,64 +418,84 @@ seen as a prototype to pattern matching in Python, there is only one
|
|||
Full pattern matching differs from this in that there is more variety
|
||||
in structual patterns but only a minimum of binding patterns.
|
||||
|
||||
Patterns differ from assignment targets (as in iterable unpacking) in that
|
||||
they impose additional constraints on the structure of the subject and in
|
||||
that a subject might safely fail to match a specific pattern at any point
|
||||
Patterns differ from assignment targets (as in iterable unpacking) in two ways:
|
||||
they impose additional constraints on the structure of the subject, and
|
||||
a subject may safely fail to match a specific pattern at any point
|
||||
(in iterable unpacking, this constitutes an error). The latter means that
|
||||
pattern should avoid side effects wherever possible, including binding
|
||||
values to attributes or subscripts.
|
||||
pattern should avoid side effects wherever possible.
|
||||
|
||||
This desire to avoid side effects is one reason why capture patterns
|
||||
don't allow binding values to attributes or subscripts: if the
|
||||
containing pattern were to fail in a later step, it would be hard to
|
||||
revert such bindings.
|
||||
|
||||
A cornerstone of pattern matching is the possibility of arbitrarily
|
||||
*nesting patterns*. The nesting allows for expressing deep
|
||||
*nesting patterns*. The nesting allows expressing deep
|
||||
tree structures (for an example of nested class patterns, see the motivation
|
||||
section above) as well as alternatives.
|
||||
|
||||
Although the structural patterns might superficially look like expressions,
|
||||
Although patterns might superficially look like expressions,
|
||||
it is important to keep in mind that there is a clear distinction. In fact,
|
||||
no pattern is or contains an expression. It is more productive to think of
|
||||
patterns as declarative elements similar to the formal parameters in a
|
||||
function definition.
|
||||
|
||||
|
||||
Walrus/AS patterns
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
AS Patterns
|
||||
~~~~~~~~~~~
|
||||
|
||||
Patterns fall into two categories: most patterns impose a (structural)
|
||||
constraint that the subject needs to fulfill, whereas the capture pattern
|
||||
binds the subject to a name without regard for the subject's structure or
|
||||
actual value. Consequently, a pattern can either express a constraint or
|
||||
bind a value, but not both. Walrus/AS patterns fill this gap in that they
|
||||
bind a value, but not both. AS patterns fill this gap in that they
|
||||
allow the user to specify a general pattern as well as capture the subject
|
||||
in a variable.
|
||||
|
||||
Typical use cases for the Walrus/AS pattern include OR and Class patterns
|
||||
together with a binding name as in, e.g., ``case BinOp(op := '+'|'-', ...):``
|
||||
or ``case [first := int(), second := int()]:``. The latter could be
|
||||
Typical use cases for the AS pattern include OR and Class patterns
|
||||
together with a binding name as in, e.g., ``case BinOp('+'|'-' as op, ...):``
|
||||
or ``case [int() as first, int() as second]:``. The latter could be
|
||||
understood as saying that the subject must fulfil two distinct pattern:
|
||||
``[first, second]`` as well as ``[int(), int()]``. The Walrus/AS pattern
|
||||
``[first, second]`` as well as ``[int(), int()]``. The AS pattern
|
||||
can thus be seen as a special case of an 'and' pattern (see OR patterns
|
||||
below for an additional discussion of 'and' patterns).
|
||||
|
||||
Example using the Walrus/AS pattern::
|
||||
In an earlier version, the AS pattern was devised as a 'Walrus pattern',
|
||||
written as ``case [first:=int(), second:=int()]``. However, using ``as``
|
||||
offers some advantages over ``:=``:
|
||||
|
||||
- The walrus operator ``:=`` is used to capture the result of an expression
|
||||
on the right hand side, whereas ``as`` generally indicates some form of
|
||||
'processing' as in ``import foo as bar`` or ``except E as err:``. Indeed,
|
||||
the pattern ``P as x`` does not assign the pattern ``P`` to ``x``, but
|
||||
rather the subject that successfully matches ``P``.
|
||||
|
||||
- ``as`` allows for a more consistent data flow from left to right (the
|
||||
attributes in Class patterns also follow a left-to-right data flow).
|
||||
|
||||
- The walrus operator is very close attributes in the Class pattern,
|
||||
potentially leading to some confusion.
|
||||
|
||||
**Example** using the AS pattern::
|
||||
|
||||
def simplify_expr(tokens):
|
||||
match tokens:
|
||||
case [l:=('('|'['), *expr, r:=(')'|']')] if (l+r) in ('()', '[]'):
|
||||
case [('('|'[') as l, *expr, (')'|']') as r] if (l+r) in ('()', '[]'):
|
||||
return simplify_expr(expr)
|
||||
case [0, op:=('+'|'-'), right]:
|
||||
case [0, ('+'|'-') as op, right]:
|
||||
return UnaryOp(op, right)
|
||||
case [left:=(int() | float()) | Num(left), '+', right:=(int() | float()) | Num(right)]:
|
||||
case [(int() | float() as left) | Num(left), '+', (int() | float() as right) | Num(right)]:
|
||||
return Num(left + right)
|
||||
case [value:=(int() | float())]
|
||||
case [(int() | float()) as value]:
|
||||
return Num(value)
|
||||
|
||||
|
||||
OR patterns
|
||||
OR Patterns
|
||||
~~~~~~~~~~~
|
||||
|
||||
The OR pattern allows you to combine 'structurally equivalent' alternatives
|
||||
into a new pattern, i.e. several patterns can share a common handler. If any
|
||||
one of an OR pattern's subpatterns matches the given subject, the entire OR
|
||||
of an OR pattern's subpatterns matches the subject, the entire OR
|
||||
pattern succeeds.
|
||||
|
||||
Statically typed languages prohibit the binding of names (capture patterns)
|
||||
|
@ -422,13 +506,16 @@ must bind the same set of variables so as not to leave potentially undefined
|
|||
names. With two alternatives ``P | Q``, this means that if *P* binds the
|
||||
variables *u* and *v*, *Q* must bind exactly the same variables *u* and *v*.
|
||||
|
||||
There was some discussion on whether to use the bar ``|`` or the keyword
|
||||
``or`` in order to separate alternatives. The OR pattern does not fully fit
|
||||
There was some discussion on whether to use the bar symbol ``|`` or the ``or``
|
||||
keyword to separate alternatives. The OR pattern does not fully fit
|
||||
the existing semantics and usage of either of these two symbols. However,
|
||||
``|`` is the symbol of choice in all programming languages with support of
|
||||
the OR pattern and is even used in that capacity for regular expressions in
|
||||
Python as well. Moreover, ``|`` is not only used for bitwise OR, but also
|
||||
the OR pattern and is used in that capacity for regular expressions in
|
||||
Python as well. It is also the traditional separator between alternatives
|
||||
in formal grammars (including Python's).
|
||||
Moreover, ``|`` is not only used for bitwise OR, but also
|
||||
for set unions and dict merging (:pep:`584`).
|
||||
|
||||
Other alternatives were considered as well, but none of these would allow
|
||||
OR-patterns to be nested inside other patterns:
|
||||
|
||||
|
@ -468,8 +555,9 @@ OR-patterns to be nested inside other patterns:
|
|||
print("A corner of the unit square")
|
||||
|
||||
|
||||
*AND and NOT patterns.*
|
||||
This proposal defines an OR-pattern (|) to match one of several alternates;
|
||||
**AND and NOT Patterns**
|
||||
|
||||
Since this proposal defines an OR-pattern (``|``) to match one of several alternates,
|
||||
why not also an AND-pattern (``&``) or even a NOT-pattern (``!``)?
|
||||
Especially given that some other languages (``F#`` for example) support
|
||||
AND-patterns.
|
||||
|
@ -480,27 +568,28 @@ all attributes and elements mentioned must be present for the match to
|
|||
succeed. Guard conditions can also support many of the use cases that a
|
||||
hypothetical 'and' operator would be used for.
|
||||
|
||||
A negation of a match pattern using the operator ``!`` as a prefix would match
|
||||
exactly if the pattern itself does not match. For instance, ``!(3 | 4)``
|
||||
would match anything except ``3`` or ``4``. However, there is evidence from
|
||||
other languages that this is rarely useful and primarily used as double
|
||||
negation ``!!`` to control variable scopes and prevent variable bindings
|
||||
(which does not apply to Python). Other use cases are better expressed using
|
||||
guards.
|
||||
A negation of a match pattern using the operator ``!`` as a prefix
|
||||
would match exactly if the pattern itself does not match. For
|
||||
instance, ``!(3 | 4)`` would match anything except ``3`` or ``4``.
|
||||
However, there is `evidence from other languages
|
||||
<https://dl.acm.org/doi/abs/10.1145/2480360.2384582>`_ that this is
|
||||
rarely useful, and primarily used as double negation ``!!`` to control
|
||||
variable scopes and prevent variable bindings (which does not apply to
|
||||
Python). Other use cases are better expressed using guards.
|
||||
|
||||
In the end, it was decided that this would make the syntax more complex
|
||||
without adding a significant benefit.
|
||||
without adding a significant benefit. It can always be added later.
|
||||
|
||||
|
||||
Example using the OR pattern::
|
||||
**Example** using the OR pattern::
|
||||
|
||||
def simplify(expr):
|
||||
match expr:
|
||||
case ('/', 0, 0):
|
||||
return expr
|
||||
case ('*' | '/', 0, _):
|
||||
case ('*'|'/', 0, _):
|
||||
return 0
|
||||
case ('+' | '-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*' | '/', x, 1):
|
||||
case ('+'|'-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*'|'/', x, 1):
|
||||
return x
|
||||
return expr
|
||||
|
||||
|
@ -511,8 +600,8 @@ Literal Patterns
|
|||
~~~~~~~~~~~~~~~~
|
||||
|
||||
Literal patterns are a convenient way for imposing constraints on the
|
||||
value of a subject, rather than its type or structure. Literal patterns
|
||||
even allow you to emulate a switch statement using pattern matching.
|
||||
value of a subject, rather than its type or structure. They also
|
||||
allow you to emulate a switch statement using pattern matching.
|
||||
|
||||
Generally, the subject is compared to a literal pattern by means of standard
|
||||
equality (``x == y`` in Python syntax). Consequently, the literal patterns
|
||||
|
@ -522,7 +611,7 @@ match the same set of objects because ``True == 1`` holds. However, we
|
|||
believe that many users would be surprised finding that ``case True:``
|
||||
matched the subject ``1.0``, resulting in some subtle bugs and convoluted
|
||||
workarounds. We therefore adopted the rule that the three singleton
|
||||
objects ``None``, ``False`` and ``True`` match by identity (``x is y`` in
|
||||
patterns ``None``, ``False`` and ``True`` match by identity (``x is y`` in
|
||||
Python syntax) rather than equality. Hence, ``case True:`` will match only
|
||||
``True`` and nothing else. Note that ``case 1:`` would still match ``True``,
|
||||
though, because the literal pattern ``1`` works by equality and not identity.
|
||||
|
@ -530,20 +619,22 @@ though, because the literal pattern ``1`` works by equality and not identity.
|
|||
Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would
|
||||
match both the integer ``1`` and the floating point number ``1.0``, whereas
|
||||
``case 1:`` would only match the integer ``1`` were eventually dropped in
|
||||
favor of the simpler and consistent rule based on equality. Moreover, any
|
||||
favor of the simpler and more consistent rule based on equality. Moreover, any
|
||||
additional checks whether the subject is an instance of ``numbers.Integral``
|
||||
would come at a high runtime cost to introduce what would essentially be
|
||||
novel in Python. When needed, the explicit syntax ``case int(1):`` might
|
||||
a novel idea in Python. When needed, the explicit syntax ``case int(1):`` can
|
||||
be used.
|
||||
|
||||
Recall that literal patterns are *not* expressions, but directly denote a
|
||||
specific value or object. From a syntactical point of view, we have to
|
||||
ensure that negative and complex numbers can equally be used as patterns,
|
||||
although they are not atomic literal values (i.e. the seeming literal value
|
||||
``-3+4j`` would syntactically be an expression of the form
|
||||
``BinOp(UnaryOp('-', 3), '+', 4j)``, but as expressions are not part of
|
||||
patterns, we added syntactic support for such complex value literals without
|
||||
having to resort to full expressions). Interpolated *f*-strings, on the
|
||||
Recall that literal patterns are *not* expressions, but directly
|
||||
denote a specific value. From a pragmatic point of view, we want to
|
||||
allow using negative and even complex values as literal patterns, but
|
||||
they are not atomic literals (only unsigned real and imaginary numbers
|
||||
are). E.g., ``-3+4j`` is syntactically an expression of the form
|
||||
``BinOp(UnaryOp('-', 3), '+', 4j)``. Since expressions are not part
|
||||
of patterns, we had to add explicit syntactic support for such values
|
||||
without having to resort to full expressions.
|
||||
|
||||
Interpolated *f*-strings, on the
|
||||
other hand, are not literal values, despite their appearance and can
|
||||
therefore not be used as literal patterns (string concatenation, however,
|
||||
is supported).
|
||||
|
@ -551,7 +642,27 @@ is supported).
|
|||
Literal patterns not only occur as patterns in their own right, but also
|
||||
as keys in *mapping patterns*.
|
||||
|
||||
Example using Literal patterns::
|
||||
|
||||
**Range matching patterns.**
|
||||
This would allow patterns such as ``1...6``. However, there are a host of
|
||||
ambiguities:
|
||||
|
||||
* Is the range open, half-open, or closed? (I.e. is ``6`` included in the
|
||||
above example or not?)
|
||||
* Does the range match a single number, or a range object?
|
||||
* Range matching is often used for character ranges ('a'...'z') but that
|
||||
won't work in Python since there's no character data type, just strings.
|
||||
* Range matching can be a significant performance optimization if you can
|
||||
pre-build a jump table, but that's not generally possible in Python due
|
||||
to the fact that names can be dynamically rebound.
|
||||
|
||||
Rather than creating a special-case syntax for ranges, it was decided
|
||||
that allowing custom pattern objects (``InRange(0, 6)``) would be more flexible
|
||||
and less ambiguous; however those ideas have been postponed for the time
|
||||
being.
|
||||
|
||||
|
||||
**Example** using Literal patterns::
|
||||
|
||||
def simplify(expr):
|
||||
match expr:
|
||||
|
@ -579,7 +690,7 @@ Capture Patterns
|
|||
|
||||
Capture patterns take on the form of a name that accepts any value and binds
|
||||
it to a (local) variable (unless the name is declared as ``nonlocal`` or
|
||||
``global``). In that sense, a simple capture pattern is basically equivalent
|
||||
``global``). In that sense, a capture pattern is similar
|
||||
to a parameter in a function definition (when the function is called, each
|
||||
parameter binds the respective argument to a local variable in the function's
|
||||
scope).
|
||||
|
@ -599,7 +710,7 @@ repeated use of names later on.
|
|||
There were calls to explicitly mark capture patterns and thus identify them
|
||||
as binding targets. According to that idea, a capture pattern would be
|
||||
written as, e.g. ``?x``, ``$x`` or ``=x``. The aim of such explicit capture
|
||||
markers is to let an unmarked name be a constant value pattern (see below).
|
||||
markers is to let an unmarked name be a value pattern (see below).
|
||||
However, this is based on the misconception that pattern matching was an
|
||||
extension of *switch* statements, placing the emphasis on fast switching based
|
||||
on (ordinal) values. Such a *switch* statement has indeed been proposed for
|
||||
|
@ -611,7 +722,13 @@ betray the objective of the proposed pattern matching syntax and simplify
|
|||
a secondary use case at the expense of additional syntactic clutter for
|
||||
core cases.
|
||||
|
||||
Example using Capture patterns::
|
||||
It has been proposed that capture patterns are not needed at all,
|
||||
since the equivalent effect can be obtained by combining a AS
|
||||
pattern with a wildcard pattern (e.g., ``case _ as x`` is equivalent
|
||||
to ``case x``). However, this would be unpleasantly verbose,
|
||||
especially given that we expect capture patterns to be very common.
|
||||
|
||||
**Example** using Capture patterns::
|
||||
|
||||
def average(*args):
|
||||
match args:
|
||||
|
@ -621,8 +738,8 @@ Example using Capture patterns::
|
|||
return x
|
||||
case []:
|
||||
return 0
|
||||
case x: # captures the entire sequence
|
||||
return sum(x) / len(x)
|
||||
case a: # captures the entire sequence
|
||||
return sum(a) / len(a)
|
||||
|
||||
|
||||
.. _wildcard_pattern:
|
||||
|
@ -660,27 +777,38 @@ of items is omitted::
|
|||
case [a, ..., z]: ...
|
||||
case [a, *, z]: ...
|
||||
|
||||
Both examples look like the would match a sequence of at two or more items,
|
||||
capturing the first and last values.
|
||||
Either example looks like it would match a sequence of two or more
|
||||
items, capturing the first and last values. While that may be the
|
||||
ultimate "wildcard", it does not convey the desired semantics.
|
||||
|
||||
A single wildcard clause (i.e. ``case _:``) is semantically equivalent to
|
||||
an ``else:``. It accepts any subject without binding it to a variable or
|
||||
performing any other operation. However, the wildcard pattern is in
|
||||
contrast to ``else`` usable as a subpattern in nested patterns.
|
||||
An alternative that does not suggest an arbitrary number of items
|
||||
would be ``?``. However, this would require changes in the tokenizer,
|
||||
and it would put Python in a rather unique position:
|
||||
|
||||
Finally note that the underscore is as a wildcard pattern in *every*
|
||||
The underscore is as a wildcard pattern in *every*
|
||||
programming language with pattern matching that we could find
|
||||
(including *C#*, *Elixir*, *Erlang*, *F#*, *Grace*, *Haskell*,
|
||||
*Mathematica*, *OCaml*, *Ruby*, *Rust*, *Scala*, *Swift*, and *Thorn*).
|
||||
Keeping in mind that many users of Python also work with other programming
|
||||
languages, have prior experience when learning Python, or moving on to
|
||||
other languages after having learnt Python, we find that such well
|
||||
established standards are important and relevant with respect to
|
||||
languages, have prior experience when learning Python, and may move on to
|
||||
other languages after having learned Python, we find that such
|
||||
well-established standards are important and relevant with respect to
|
||||
readability and learnability. In our view, concerns that this wildcard
|
||||
means that a regular name received special treatment are not strong
|
||||
enough to introduce syntax that would make Python special.
|
||||
|
||||
Example using the Wildcard pattern::
|
||||
*Else blocks.* A case block without a guard whose pattern is a single
|
||||
wildcard (i.e., ``case _:``) accepts any subject without binding it to
|
||||
a variable or performing any other operation. It is thus semantically
|
||||
equivalent to ``else:``, if it were supported. However, adding such
|
||||
an else block to the match statement syntax would not remove the need
|
||||
for the wildcard pattern in other contexts. Another argument against
|
||||
this is that there would be two plausible indentation levels for an
|
||||
else block: aligned with ``case`` or aligned with ``match``. The
|
||||
authors have found it quite contentious which indentation level to
|
||||
prefer.
|
||||
|
||||
**Example** using the Wildcard pattern::
|
||||
|
||||
def is_closed(sequence):
|
||||
match sequence:
|
||||
|
@ -692,45 +820,43 @@ Example using the Wildcard pattern::
|
|||
return False
|
||||
|
||||
|
||||
.. _constant_value_pattern:
|
||||
.. _value_pattern:
|
||||
|
||||
Value Patterns
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
It is good programming style to use named constants for parametric values or
|
||||
to clarify the meaning of particular values. Clearly, it would be desirable
|
||||
to write ``case (HttpStatus.OK, body):`` rather than
|
||||
to clarify the meaning of particular values. Clearly, it would be preferable
|
||||
to write ``case (HttpStatus.OK, body):`` over
|
||||
``case (200, body):``, for example. The main issue that arises here is how to
|
||||
distinguish capture patterns (variables) from constant value patterns. The
|
||||
distinguish capture patterns (variable bindings) from value patterns. The
|
||||
general discussion surrounding this issue has brought forward a plethora of
|
||||
options, which we cannot all fully list here.
|
||||
|
||||
Strictly speaking, constant value patterns are not really necessary, but
|
||||
Strictly speaking, value patterns are not really necessary, but
|
||||
could be implemented using guards, i.e.
|
||||
``case (status, body) if status == HttpStatus.OK:``. Nonetheless, the
|
||||
convenience of constant value patterns is unquestioned and obvious.
|
||||
convenience of value patterns is unquestioned and obvious.
|
||||
|
||||
The observation that constants tend to be written in uppercase letters or
|
||||
collected in enumeration-like namespaces suggests possible rules to discern
|
||||
constants syntactically. However, the idea of using upper vs. lower case as
|
||||
constants syntactically. However, the idea of using upper- vs. lowercase as
|
||||
a marker has been met with scepticism since there is no similar precedence
|
||||
in core Python (although it is common in other languages). We therefore only
|
||||
adopted the rule that any dotted name (i.e. attribute access) is to be
|
||||
interpreted as a constant value pattern like ``HttpStatus.OK``
|
||||
above. This precludes, in particular, local variables from acting as
|
||||
constants.
|
||||
adopted the rule that any dotted name (i.e., attribute access) is to be
|
||||
interpreted as a value pattern, for example ``HttpStatus.OK``
|
||||
above. This precludes, in particular, local variables and global
|
||||
variables defined in the current module from acting as constants.
|
||||
|
||||
Global variables can only be directly used as constant when defined in other
|
||||
modules, although there are workarounds to access the current module as a
|
||||
namespace as well. A proposed rule to use a leading dot (e.g.
|
||||
A proposed rule to use a leading dot (e.g.
|
||||
``.CONSTANT``) for that purpose was critisised because it was felt that the
|
||||
dot would not be a visible-enough marker for that purpose. Partly inspired
|
||||
by use cases in other programming languages, a number of different
|
||||
by forms found in other programming languages, a number of different
|
||||
markers/sigils were proposed (such as ``^CONSTANT``, ``$CONSTANT``,
|
||||
``==CONSTANT``, ``CONSTANT?``, or the word enclosed in backticks), although
|
||||
there was no obvious or natural choice. The current proposal therefore
|
||||
leaves the discussion and possible introduction of such a 'constant' marker
|
||||
for future PEPs.
|
||||
for a future PEP.
|
||||
|
||||
Distinguishing the semantics of names based on whether it is a global
|
||||
variable (i.e. the compiler would treat global variables as constants rather
|
||||
|
@ -740,7 +866,7 @@ patterns. Moreover, pattern matching could not be used directly inside a
|
|||
module's scope because all variables would be global, making capture
|
||||
patterns impossible.
|
||||
|
||||
Example using the Value pattern::
|
||||
**Example** using the Value pattern::
|
||||
|
||||
def handle_reply(reply):
|
||||
match reply:
|
||||
|
@ -782,7 +908,7 @@ iterable.
|
|||
possible.
|
||||
|
||||
- A starred pattern will capture a sub-sequence of arbitrary length,
|
||||
mirroring iterable unpacking as well. Only one starred item may be
|
||||
again mirroring iterable unpacking. Only one starred item may be
|
||||
present in any sequence pattern. In theory, patterns such as ``(*_, 3, *_)``
|
||||
could be understood as expressing any sequence containing the value ``3``.
|
||||
In practise, however, this would only work for a very narrow set of use
|
||||
|
@ -790,31 +916,36 @@ iterable.
|
|||
|
||||
- The sequence pattern does *not* iterate through an iterable subject. All
|
||||
elements are accessed through subscripting and slicing, and the subject must
|
||||
be an instance of ``collections.abc.Sequence`` (including, in particular,
|
||||
lists and tuples, but excluding strings and bytes, as well as sets and
|
||||
dictionaries).
|
||||
be an instance of ``collections.abc.Sequence``. This includes, of course,
|
||||
lists and tuples, but excludes e.g. sets and dictionaries. While it would
|
||||
include strings and bytes, we make an exception for these (see below).
|
||||
|
||||
A sequence pattern cannot just iterate through any iterable object. The
|
||||
consumption of elements from the iteration would have to be undone if the
|
||||
overall pattern fails, which is not possible.
|
||||
overall pattern fails, which is not feasible.
|
||||
|
||||
Relying on ``len()`` and subscripting and slicing alone does not work to
|
||||
identify sequences because sequences share the protocol with more general
|
||||
maps (dictionaries) in this regard. It would be surprising if a sequence
|
||||
pattern also matched dictionaries or other custom objects that implement
|
||||
To identify sequences we cannot rely on ``len()`` and subscripting and
|
||||
slicing alone, because sequences share these protocols with mappings
|
||||
(e.g. `dict`) in this regard. It would be surprising if a sequence
|
||||
pattern also matched a dictionaries or other objects implementing
|
||||
the mapping protocol (i.e. ``__getitem__``). The interpreter therefore
|
||||
performs an instance check to ensure that the subject in question really
|
||||
is a sequence (of known type).
|
||||
is a sequence (of known type). (As an optimization of the most common
|
||||
case, if the subject is exactly a list or a tuple, the instance check
|
||||
can be skipped.)
|
||||
|
||||
String and bytes objects have a dual nature: they are both 'atomic' objects
|
||||
in their own right, as well as sequences (with a strongly recursive nature
|
||||
in that a string is a sequence of strings). The typical behavior and use
|
||||
cases for strings and bytes are different enough from that of tuples and
|
||||
cases for strings and bytes are different enough from those of tuples and
|
||||
lists to warrant a clear distinction. It is in fact often unintuitive and
|
||||
unintended that strings pass for sequences as evidenced by regular questions
|
||||
unintended that strings pass for sequences, as evidenced by regular questions
|
||||
and complaints. Strings and bytes are therefore not matched by a sequence
|
||||
pattern, limiting the sequence pattern to a very specific understanding of
|
||||
'sequence'.
|
||||
'sequence'. The built-in ``bytearray`` type, being a mutable version of
|
||||
``bytes``, also deserves an exception; but we don't intend to
|
||||
enumerate all other types that may be used to represent bytes
|
||||
(e.g. some, but not all, instances of ``memoryview`` and ``array.array``).
|
||||
|
||||
|
||||
.. _mapping_pattern:
|
||||
|
@ -823,9 +954,9 @@ Mapping Patterns
|
|||
~~~~~~~~~~~~~~~~
|
||||
|
||||
Dictionaries or mappings in general are one of the most important and most
|
||||
widely used data structures in Python. In contrast to sequences mappings
|
||||
are built for fast direct access to arbitrary elements (identified by a key).
|
||||
In most use cases an element is retrieved from a dictionary by a known key
|
||||
widely used data structures in Python. In contrast to sequences, mappings
|
||||
are built for fast direct access to arbitrary elements identified by a key.
|
||||
In most cases an element is retrieved from a dictionary by a known key
|
||||
without regard for any ordering or other key-value pairs stored in the same
|
||||
dictionary. Particularly common are string keys.
|
||||
|
||||
|
@ -836,13 +967,13 @@ pattern does not check for the presence of additional keys. Should it be
|
|||
necessary to impose an upper bound on the mapping and ensure that no
|
||||
additional keys are present, then the usual double-star-pattern ``**rest``
|
||||
can be used. The special case ``**_`` with a wildcard, however, is not
|
||||
supported as it would not have any effect, but might lead to a wrong
|
||||
supported as it would not have any effect, but might lead to an incorrect
|
||||
understanding of the mapping pattern's semantics.
|
||||
|
||||
To avoid overly expensive matching algorithms, keys must be literals or
|
||||
constant values.
|
||||
value patterns.
|
||||
|
||||
Example using the Mapping pattern::
|
||||
**Example** using the Mapping pattern::
|
||||
|
||||
def change_red_to_blue(json_obj):
|
||||
match json_obj:
|
||||
|
@ -858,10 +989,10 @@ Example using the Mapping pattern::
|
|||
Class Patterns
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
Class patterns fulfil two purposes: checking whether a given subject is
|
||||
indeed an instance of a specific class and extracting data from specific
|
||||
attributes of the subject. A quick survey revealed that ``isinstance()``
|
||||
is indeed one of the most often used functions in Python in terms of
|
||||
Class patterns fulfill two purposes: checking whether a given subject is
|
||||
indeed an instance of a specific class, and extracting data from specific
|
||||
attributes of the subject. Anecdotal evidence revealed that ``isinstance()``
|
||||
is one of the most often used functions in Python in terms of
|
||||
static occurrences in programs. Such instance checks typically precede
|
||||
a subsequent access to information stored in the object, or a possible
|
||||
manipulation thereof. A typical pattern might be along the lines of::
|
||||
|
@ -873,7 +1004,7 @@ manipulation thereof. A typical pattern might be along the lines of::
|
|||
elif isinstance(node, Leaf):
|
||||
print(node.value)
|
||||
|
||||
In many cases, however, class patterns occur nested as in the example
|
||||
In many cases class patterns occur nested, as in the example
|
||||
given in the motivation::
|
||||
|
||||
if (isinstance(node, BinOp) and node.op == "+"
|
||||
|
@ -881,8 +1012,8 @@ given in the motivation::
|
|||
a, b, c = node.left, node.right.left, node.right.right
|
||||
# Handle a + b*c
|
||||
|
||||
The class pattern lets you concisely specify both an instance-check as
|
||||
well as relevant attributes (with possible further constraints). It is
|
||||
The class pattern lets you concisely specify both an instance check
|
||||
and relevant attributes (with possible further constraints). It is
|
||||
thereby very tempting to write, e.g., ``case Node(left, right):`` in the
|
||||
first case above and ``case Leaf(value):`` in the second. While this
|
||||
indeed works well for languages with strict algebraic data types, it is
|
||||
|
@ -890,14 +1021,14 @@ problematic with the structure of Python objects.
|
|||
|
||||
When dealing with general Python objects, we face a potentially very large
|
||||
number of unordered attributes: an instance of ``Node`` contains a large
|
||||
number of attributes (most of which are 'private methods' such as, e.g.,
|
||||
``__repr__``). Moreover, the interpreter cannot reliably deduce which of
|
||||
the attributes comes first and which comes second. For an object that
|
||||
number of attributes (most of which are 'special methods' such as
|
||||
``__repr__``). Moreover, the interpreter cannot reliably deduce the
|
||||
ordering of attributes. For an object that
|
||||
represents a circle, say, there is no inherently obvious ordering of the
|
||||
attributes ``x``, ``y`` and ``radius``.
|
||||
|
||||
We envision two possibilities for dealing with this issue: either explicitly
|
||||
name the attributes of interest or provide an additional mapping that tells
|
||||
name the attributes of interest, or provide an additional mapping that tells
|
||||
the interpreter which attributes to extract and in which order. Both
|
||||
approaches are supported. Moreover, explicitly naming the attributes of
|
||||
interest lets you further specify the required structure of an object; if
|
||||
|
@ -948,6 +1079,20 @@ the explicit construction of instances, where class patterns ``c(p, q)``
|
|||
deliberately mirror the syntax of creating instances.
|
||||
|
||||
|
||||
**Type annotations for pattern variables.**
|
||||
The proposal was to combine patterns with type annotations::
|
||||
|
||||
match x:
|
||||
case [a: int, b: str]: print(f"An int {a} and a string {b}:)
|
||||
case [a: int, b: int, c: int]: print(f"Three ints", a, b, c)
|
||||
...
|
||||
|
||||
This idea has a lot of problems. For one, the colon can only
|
||||
be used inside of brackets or parens, otherwise the syntax becomes
|
||||
ambiguous. And because Python disallows ``isinstance()`` checks
|
||||
on generic types, type annotations containing generics will not
|
||||
work as expected.
|
||||
|
||||
|
||||
History and Context
|
||||
===================
|
||||
|
@ -1052,7 +1197,6 @@ Or you would combine these ideas to write ``Node(right=y)`` so as to require
|
|||
an instance of ``Node`` but only extract the value of the `right` attribute.
|
||||
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
|
|
Loading…
Reference in New Issue