python-peps/pep-0642.rst

1192 lines
50 KiB
ReStructuredText
Raw Normal View History

PEP: 642
Title: Constraint Pattern Syntax for Structural Pattern Matching
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
BDFL-Delegate:
Discussions-To: Python-Dev <python-dev@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Requires: 634
Created: 26-Sep-2020
Python-Version: 3.10
Post-History: 31-Oct-2020, 8-Nov-2020
Resolution:
Abstract
========
This PEP covers an alternative syntax proposal for PEP 634's structural pattern
matching that explicitly anchors match patterns in the existing syntax for
assignment targets, while retaining most semantic aspects of the existing
proposal.
Specifically, this PEP adopts an additional design restriction that PEP 634's
authors considered unreasonable: that any novel match pattern semantics must
offer syntax that future PEPs could plausibly propose for adoption in assignment
targets. It is (reluctantly) considered acceptable to offer syntactic sugar that
is specific to match patterns, as long as there is an underlying more explicit
form that is compatible (or potentially compatible) with assignment targets.
As a consequence, this PEP proposes the following changes to the proposed match
pattern syntax:
* a new pattern type is introduced: "constraint patterns"
* constraint patterns are either equality constraints or identity constraints
* equality constraints use ``==`` as a prefix marker on an otherwise
arbitrary primary expression: ``== EXPR``
* identity constraints use ``is`` as a prefix marker on an otherwise
arbitrary primary expression: ``is EXPR``
* value patterns and literal patterns (with some exceptions) are redefined as
"inferred equality constraints", and become a syntactic shorthand for an
equality constraint
* ``None`` and ``...`` are defined as "inferred identity constraints" and become
a syntactic shorthand for an identity constraint
* due to ambiguity of intent, neither ``True`` nor ``False`` are accepted as
implying an inferred constraint (instead requiring the use of an explicit
constraint, a class pattern, or a capture pattern with a guard expression)
* inferred constraints are *not* defined in the Abstract Syntax Tree. Instead,
inferred constraints are converted to explicit constraints by the parser
* The wildcard pattern changes from ``_`` (single underscore) to ``__`` (double
underscore), and gains a dedicated ``SkippedBinding`` node in the AST
* Mapping patterns change to allow arbitrary primary expressions as keys
Relationship with other PEPs
============================
This PEP both depends on and competes with PEP 634 - the PEP author agrees that
match statements would be a sufficiently valuable addition to the language to
be worth the additional complexity that they add to the learning process, but
disagrees with the idea that "simple name vs literal or attribute lookup"
really offers an adequate syntactic distinction between name binding and value
lookup operations in match patterns. (Even though this PEP ultimately retained
that shorthand to reduce the verbosity of common use cases, it still redefines
it in terms of a more explicit underlying construct).
This PEP agrees with the spirit of PEP 640 (that the chosen wildcard pattern to
skip a name binding should be supported everywhere, not just in match patterns),
but is now proposing a different spelling for the wildcard syntax (``__`` rather
than ``?``). As such, it competes with PEP 640 as written, but would complement
a proposal to deprecate the use of ``__`` as an ordinary identifier and instead
turn it into a general purpose wildcard marker that always skips making a new
local variable binding.
Motivation
==========
The original PEP 622 (which was later split into PEP 634, PEP 635, and PEP 636)
incorporated an unstated but essential assumption in its syntax design: that
neither ordinary expressions *nor* the existing assignment target syntax provide
an adequate foundation for the syntax used in match patterns.
While the PEP didn't explicitly state this assumption, one of the PEP authors
explained it clearly on python-dev [1_]:
The actual problem that I see is that we have different cultures/intuitions
fundamentally clashing here. In particular, so many programmers welcome
pattern matching as an "extended switch statement" and find it therefore
strange that names are binding and not expressions for comparison. Others
argue that it is at odds with current assignment statements, say, and
question why dotted names are _/not/_ binding. What all groups seem to
have in common, though, is that they refer to _/their/_ understanding and
interpretation of the new match statement as 'consistent' or 'intuitive'
--- naturally pointing out where we as PEP authors went wrong with our
design.
But here is the catch: at least in the Python world, pattern matching as
proposed by this PEP is an unprecedented and new way of approaching a common
problem. It is not simply an extension of something already there. Even
worse: while designing the PEP we found that no matter from which angle you
approach it, you will run into issues of seeming 'inconsistencies' (which is
to say that pattern matching cannot be reduced to a 'linear' extension of
existing features in a meaningful way): there is always something that goes
fundamentally beyond what is already there in Python. That's why I argue
that arguments based on what is 'intuitive' or 'consistent' just do not
make sense _/in this case/_.
PEP 635 (and PEP 622 before it) makes a strong case that treating capture
patterns as the default usage for simple names in match patterns is the right
approach, and provides a number of examples where having names express value
constraints by default would be confusing (this difference from C/C++ switch
statement semantics is also a key reason it makes sense to use ``match`` as the
introductory keyword for the new statement rather than ``switch``).
However, PEP 635 doesn't even *try* to make the case for the second assertion,
that treating match patterns as a variation on assignment targets also leads to
inherent contradictions. Even a PR submitted to explicitly list this option in
the "Rejected Ideas" section of the original PEP 622 was declined [2_].
This PEP instead starts from the assumption that it *is* possible to treat match
patterns as a variation on assignment targets, and the only essential
differences that emerge relative to the syntactic proposal in PEP 634 are:
* a requirement to offer an explicit marker prefix for value lookups rather than
only allowing them to be inferred from the use of dotted names or literals; and
* a requirement to use a non-binding wildcard marker other than ``_``.
This PEP proposes constraint expressions as a way of addressing the first point,
and changes the proposed non-binding wildcard marker to a double-underscore to
address the latter.
PEP 634 also proposes special casing the literals ``None``, ``True``, and
``False`` so that they're compared by identity when written directly as a
literal pattern, but by equality when referenced by a value pattern. This PEP
eliminates the need for those special cases by proposing distinct syntax for
matching by identity and matching by equality, but does accept the convenience
and consistency argument in allowing ``None`` as a shorthand for ``is None``.
Specification
=============
This PEP retains the overall `match`/`case` statement syntax from PEP 634, and
retains both the syntax and semantics for the following match pattern variants:
* class patterns
* group patterns
* sequence patterns
Pattern combination (both OR and AS patterns) and guard expressions also remain
the same as they are in PEP 634.
Capture patterns are essentially unchanged, except that ``_`` becomes a regular
capture pattern, due to the wildcard pattern marker changing to ``__``.
Constraint patterns are added, offering equality constraints and identity
constraints.
Literal patterns and value patterns are replaced by inferred constraint
patterns, offering inferred equality constraints for strings, numbers and
attribute lookups, and inferred identity constraints for ``None`` and ``...``.
Mapping patterns change to allow arbitrary primary expressions for keys, rather
than being restricted to literal patterns or value patterns.
Wildcard patterns are changed to use ``__`` (double underscore) rather than
``_`` (single underscore), and are also given a new dedicated node in the
Abstract Syntax Tree produced by the parser.
Constraint patterns
-------------------
Constraint patterns use the following simplified syntax::
constraint_pattern: id_constraint | eq_constraint
eq_constraint: '==' primary
id_constraint: 'is' primary
The constraint expression is an arbitrary primary expression - it can be a
simple name, a dotted name lookup, a literal, a function call, or any other
primary expression.
If this PEP were to be adopted in preference to PEP 634, then all literal and
value patterns could instead be written more explicitly as constraint patterns::
# Literal patterns
match number:
case == 0:
print("Nothing")
case == 1:
print("Just one")
case == 2:
print("A couple")
case == (-1):
print("One less than nothing")
case == (1-1j):
print("Good luck with that...")
# Additional literal patterns
match value:
case == True:
print("True or 1")
case == False:
print("False or 0")
case == None:
print("None")
case == "Hello":
print("Text 'Hello'")
case == b"World!":
print("Binary 'World!'")
case == ...:
print("May be useful when writing __getitem__ methods?")
# Matching by identity rather than equality
SENTINEL = object()
match value:
case is True:
print("True, not 1")
case is False:
print("False, not 0")
case is None:
print("None, following PEP 8 comparison guidelines")
case is SENTINEL:
print("Matches the sentinel by identity, not just value")
# Constant value patterns
from enum import Enum
class Sides(str, Enum):
SPAM = "Spam"
EGGS = "eggs"
...
preferred_side = Sides.EGGS
match entree[-1]:
case == Sides.SPAM: # Compares entree[-1] == Sides.SPAM.
response = "Have you got anything without Spam?"
case == preferred_side: # Compares entree[-1] == preferred_side
response = f"Oh, I love {preferred_side}!"
case side: # Assigns side = entree[-1].
response = f"Well, could I have their Spam instead of the {side} then?"
Note the ``== preferred_side`` example: using an explicit prefix marker on
constraint expressions removes the restriction to only working with attributes
or literals for value lookups. The ``== (-1)`` and ``== (1-1j)`` examples
illustrate the use of parentheses to turn any subexpression into an atomic one.
This PEP retains the caching property specified for value patterns in PEP 634:
if a particular constraint pattern occurs more than once in a given match
statement, language implementations are explicitly permitted to cache the first
calculation on any given match statement execution and re-use it in other
clauses. (This implicit caching is less necessary in this PEP, given that
explicit local variable caching becomes a valid option, but it still seems a
useful property to preserve)
Inferred constraint patterns
----------------------------
Inferred constraint patterns use the syntax proposed for literal and value
patterns in PEP 634, but arrange them differently in the proposed grammar to
allow for a straightforward transformation by the parser into explicit
constraints in the AST output::
inferred_constraint_pattern:
| inferred_id_constraint # Emits same parser output as id_constraint
| inferred_eq_constraint # Emits same parser output as eq_constraint
inferred_id_constraint:
| 'None'
| '...'
inferred_eq_constraint:
| attr_constraint
| numeric_constraint
| strings
attr_constraint: attr !('.' | '(' | '=')
attr: name_or_attr '.' NAME
name_or_attr: attr | NAME
numeric_constraint:
| signed_number !('+' | '-')
| signed_number '+' NUMBER
| signed_number '-' NUMBER
signed_number: NUMBER | '-' NUMBER
The terminology changes slightly to refer to them as a kind of constraint
rather than as a kind of pattern, clearly separating the subelements inside
patterns into "patterns", which define structures and name binding targets to
match against, and "constraints", which look up existing values to compare
against.
In practice, the key differences between this PEP's inferred constraint patterns
and PEP 634's value patterns and literal patterns are that
* inferred constraint patterns won't actually exist in the AST definition.
Instead, they'll be replaced by an explicit constraint node, exactly as if
they had been written with the explicit ``==`` or ``is`` prefix
* ``None`` and ``...`` are handled as part of a separate grammar rule, rather
than needing to be handled as a special case of literal patterns in the parser
* equality constraints are inferred for f-strings in addition to being inferred
for string literals
* inferred constraints for ``True`` and ``False`` are dropped entirely on
grounds of ambiguity
* Numeric constraints don't enforce the restriction that they be limited to
complex literals (only that they be limited to single numbers, or the
addition or subtraction of two such numbers)
Note: even with inferred constraints handled entirely at the parser level, it
would still be possible to limit the inference of equality constraints to
complex numbers if the tokeniser was amended to emit a different token type
(e.g. ``INUMBER``) for imaginary numbers. The PEP doesn't currently propose
making that change (in line with its generally permissive approach), but it
could be amended to do so if desired.
Mapping patterns
----------------
Mapping patterns inherit the change to replace literal patterns and
value patterns with constraint patterns that allow arbitrary primary
expressions::
mapping_pattern: '{' [items_pattern] '}'
items_pattern: ','.key_value_pattern+ ','?
key_value_pattern:
| primary ':' or_pattern
| '**' capture_pattern
However, the constraint marker prefix is not needed in this case, as the fact
this is a key to be looked up rather than a name to be bound can already be
inferred from its position within a mapping pattern.
This means that in simple cases, mapping patterns look exactly as they do in
PEP 634::
import constants
match config:
case {"route": route}:
process_route(route)
case {constants.DEFAULT_PORT: sub_config, **rest}:
process_config(sub_config, rest)
Unlike PEP 634, however, ordinary local and global variables can also be used
to match mapping keys::
ROUTE_KEY="route"
ADDRESS_KEY="local_address"
PORT_KEY="port"
match config:
case {ROUTE_KEY: route}:
process_route(route)
case {ADDRESS_KEY: address, PORT_KEY: port}:
process_address(address, port)
Note: as complex literals are written as binary operations that are evaluated
at compile time, this PEP nominally requires that they be written in parentheses
when used as a key in a mapping pattern. This requirement could be relaxed to
match PEP 634's handling of complex numbers by also accepting
``numeric_constraint`` as defining a valid key expression, and this is how
the draft reference implementation currently works (so the affected PEP 634
test cases will compile and run as expected).
Wildcard patterns
-----------------
Wildcard patterns are changed to use ``__`` (double underscore) rather than
the ``_`` (single underscore) syntax proposed in PEP 634::
match sequence:
case [__]: # any sequence with a single element
return True
case [start, *__, end]: # a sequence with at least two elements
return start == end
case __: # anything
return False
This PEP explicitly requires that wildcard patterns be represented in the
Abstract Syntax Tree as something *other than* a regular ``Name`` node.
The draft reference implementation uses the node name ``SkippedBinding`` to
indicate that the node appears where a simple name binding would ordinarily
occur to indicate that nothing should actually be bound, but the exact name of
the node is more an implementation decision than a design one. The key design
requirement is to limit the special casing of ``__`` to the parser and allow the
rest of the compiler to distinguish wildcard patterns from capture patterns
based entirely on the kind of the AST node, rather than needing to inspect the
identifier used in ``Name`` nodes.
Design Discussion
=================
Treating match pattern syntax as an extension of assignment target syntax
-------------------------------------------------------------------------
PEP 634 already draws inspiration from assignment target syntax in the design
of its sequence pattern matching - while being restricted to sequences for
performance and runtime correctness reasons, sequence patterns are otherwise
very similar to the existing iterable unpacking and tuple packing features seen
in regular assignment statements and function signature declarations.
By requiring that any new semantics introduced by match patterns be given new
syntax that is currently disallowed in assignment targets, one of the goals of
this PEP is to explicitly leave the door open to one or more future PEPs that
enhance assignment target syntax to support some of the new features introduced
by match patterns.
In particular, being able to easily deconstruct mappings into local variables
seems likely to be generally useful, even when there's only one mapping variant
to be matched::
{"host": host, "port": port, "mode": =="TCP"} = settings
While such code could already be written using a match statement (assuming
either this PEP or PEP 634 were to be accepted into the language), an
assignment statement level variant should be able to provide standardised
exceptions for cases where the right hand side either wasn't a mapping (throwing
``TypeError``), didn't have the specified keys (throwing ``KeyError``), or didn't
have the specific values for the given keys (throwing ``ValueError``), avoiding
the need to write out that exception raising logic in every case.
PEP 635 raises the concern that enough aspects of pattern matching semantics
will differ from assignment target semantics that pursuing syntactic parallels
will end up creating confusion rather than reducing it. However, the primary
examples cited as potentially causing confusion are exactly those where the
PEP 634 syntax is *already* the same as that for assignment targets: the fact
that case patterns use iterable unpacking syntax, but only match on sequences
(and specifically exclude strings and byte-strings) rather than consuming
arbitrary iterables is an aspect of PEP 634 that this PEP leaves unchanged.
These semantic differences are intrinsic to the nature of pattern matching:
whereas it is reasonable for a one-shot assignment statement to consume a
one-shot iterator, it isn't reasonable to do that in a construct that's
explicitly about matching a given value against multiple potential targets,
making full use of the available runtime type information to ensure those checks
are as side effect free as possible.
It's an entirely orthogonal question to how the distinction is drawn between
capture patterns and patterns that check for expected values (constraint
patterns in this PEP, literal and value patterns in PEP 634), and it's a big
logical leap to take from "these specific semantic differences between iterable
unpacking and sequence matching are needed in order to handle checking against
multiple potential targets" to "we can reuse attribute binding syntax to mean
equality constraints instead and nobody is going to get confused by that".
Interaction with caching of attribute lookups in local variables
----------------------------------------------------------------
The major change between this PEP and PEP 634 is to offer ``== EXPR`` for value
constraint lookups, rather than only offering ``NAME.ATTR``. The main motivation
for this is to avoid the semantic conflict with regular assignment targets, where
``NAME.ATTR`` is already used in assignment statements to set attributes, so if
``NAME.ATTR`` were the *only* syntax for symbolic value matching, then
we're pre-emptively ruling out any future attempts to allow matching against
single patterns using the existing assignment statement syntax. We'd also be
failing to provide users with suitable scaffolding to help build correct mental
models of what the shorthand forms mean in match patterns (as compared to what
they mean in assignment targets).
However, even within match statements themselves, the ``name.attr`` syntax for
value patterns has an undesirable interaction with local variable assignment,
where routine refactorings that would be semantically neutral for any other
Python statement introduce a major semantic change when applied to a match
statement.
Consider the following code::
while value < self.limit:
... # Some code that adjusts "value"
The attribute lookup can be safely lifted out of the loop and only performed
once::
_limit = self.limit:
while value < _limit:
... # Some code that adjusts "value"
With the marker prefix based syntax proposal in this PEP, constraint patterns
would be similarly tolerant of match patterns being refactored to use a local
variable instead of an attribute lookup, with the following two statements
being functionally equivalent::
match expr:
case {"key": == self.target}:
... # Handle the case where 'expr["key"] == self.target'
case _:
... # Handle the non-matching case
_target = self.target
match expr:
case {"key": == _target}:
... # Handle the case where 'expr["key"] == self.target'
case _:
... # Handle the non-matching case
By contrast, when using the syntactic shorthand that omits the marker prefix,
the following two statements wouldn't be equivalent at all::
# PEP 634's value pattern syntax / this PEP's attribute constraint syntax
match expr:
case {"key": self.target}:
... # Handle the case where 'expr["key"] == self.target'
case _:
... # Handle the non-matching case
_target = self.target
match expr:
case {"key": _target}:
... # Matches any mapping with "key", binding its value to _target
case _:
... # Handle the non-matching case
This PEP offers a straightforward way to retain the original semantics under
this style of simplistic refactoring: use ``== _target`` to force interpretation
of the result as a constraint pattern instead of a capture pattern (i.e. drop
the no longer applicable syntactic shorthand, and switch to the explicit form).
PEP 634's proposal to offer only the shorthand syntax, with no explicitly
prefixed form, means that the primary answer on offer is "Well, don't do that,
then, only compare against attributes in namespaces, don't compare against
simple names".
PEP 622's walrus pattern syntax had another odd interaction where it might not
bind the same object as the exact same walrus expression in the body of the
case clause, but PEP 634 fixed that discrepancy by replacing walrus patterns
with AS patterns (where the fact that the value bound to the name on the RHS
might not be the same value as returned by the LHS is a standard feature common
to all uses of the "as" keyword).
Using existing comparison operators as the constraint pattern prefix
--------------------------------------------------------------------
If the need for a dedicated constraint pattern prefix is accepted, then the
next question is to ask exactly what that prefix should be.
The initially published version of this PEP proposed using the previously
unused ``?`` symbol as the prefix for equality constraints, and ``?is`` as the
prefix for identity constraints. When reviewing the PEP, Steven D'Aprano
presented a compelling counterproposal [5_] to use the existing comparison
operators (``==`` and ``is``) instead.
There were a few concerns with ``==`` as a prefix that kept it from being
chosen as the prefix in the initial iteration of the PEP:
* for common use cases, it's even more visually noisy than ``?``, as a lot of
folks with PEP 8 trained aesthetic sensibilities are going to want to put
a space between it and the following expression, effectively making it a 3
character prefix instead of 1
* when used in a class pattern, there needs to be a space between the ``=``
keyword separator and the ``==`` prefix, or the tokeniser will split them
up incorrectly (getting ``==`` and ``=`` instead of ``=`` and ``==``)
* when used in a mapping pattern, there needs to be a space between the ``:``
key/value separator and the ``==`` prefix, or the tokeniser will split them
up incorrectly (getting ``:=`` and ``=`` instead of ``:`` and ``==``)
* when used in an OR pattern, there needs to be a space between the ``|``
pattern separator and the ``==`` prefix, or the tokeniser will split them
up incorrectly (getting ``|=`` and ``=`` instead of ``|`` and ``==``)
Rather than introducing a completely new symbol, Steven's proposed resolution to
this verbosity problem was to retain the ability to omit the prefix marker in
syntactically unambiguous cases.
This prompted a review of the PEP's goals and underlying concerns, and the
determination that the author's core concern was with the idea of not even
*offering* users the ability to be explicit when they wanted or needed to be,
and instead telling them they could only express the intent that the compiler
inferred that they wanted - they couldn't be more explicit and override the
compiler's default inference when it turned out to be wrong (as it inevitably
will be in at least some cases).
Given that perspective, PEP 635's arguments against using ``?`` as part of the
pattern matching syntax held for this proposal as well, and so the PEP was
amended accordingly.
Using ``__`` as the wildcard pattern marker
-------------------------------------------
PEP 635 makes a solid case that introducing ``?`` *solely* as a wildcard pattern
marker would be a bad idea. With the syntax for constraint patterns now changed
to use existing comparison operations rather than ``?`` and ``?is``, that
argument holds for this PEP as well.
However, as noted by Thomas Wouters in [6_], PEP 634's choice of ``_`` remains
problematic as it would likely mean that match patterns would have a *permanent*
difference from all other parts of Python - the use of ``_`` in software
internationalisation and at the interactive prompt means that there isn't really
a plausible path towards using it as a general purpose "skipped binding" marker.
``__`` is an alternative "this value is not needed" marker drawn from a Stack
Overflow answer [7_] (originally posted by the author of this PEP) on the
various meanings of ``_`` in existing Python code.
This PEP also proposes adopting an implementation technique that limits
the scope of the associated special casing of ``__`` to the parser: defining a
new AST node type (``SkippedBinding``) specifically for wildcard markers.
Within the parser, ``__`` would still mean either a regular name or a wildcard
marker in a match pattern depending on where you were in the parse tree, but
within the rest of the compiler, ``Name("__")`` would still be a regular name,
while ``SkippedBinding()`` would always be a wildcard marker.
Unlike ``_``, the lack of other use cases for ``__`` means that there would be
a plausible path towards restoring identifier handling consistency with the rest
of the language by making it mean "skip this name binding" everwhere in Python:
* in the interpreter itself, deprecate loading variables with the name ``__``.
This would make reading from ``__`` emit a deprecation warning, while writing
to it would initially be unchanged. To avoid slowing down all name loads, this
could be handled by having the compiler emit additional code for the
deprecated name, rather than using a runtime check in the standard name
loading opcodes.
* after a suitable number of releases, change the parser to emit
``SkippedBinding`` for all uses of ``__`` as an assignment target, not just
those appearing inside match patterns
* consider making ``__`` a true hard keyword rather than a soft keyword
This deprecation path couldn't be followed for ``_``, as there's no way for the
interpreter to distinguish between attempts to read back ``_`` when nominally
used as a "don't care" marker, and legitimate reads of ``_`` as either an
i18n text translation function or as the last statement result at the
interactive prompt.
Names starting with double-underscores are also already reserved for use by the
language, whether that is for compile time constants (i.e. ``__debug__``),
special methods, or class attribute name mangling, so using ``__`` here would
be consistent with that existing approach.
Keeping inferred equality constraints
-------------------------------------
An early (not widely publicised) draft of this proposal considered keeping
PEP 634's literal patterns, as they don't inherently conflict with assignment
statement syntax the way that PEP 634's value patterns do (trying to assign
to a literal is already a syntax error, whereas assigning to a dotted name
sets the attribute).
They were removed in the initially published version due to the fact that they
have the same syntax sensitivity problem as attribute constraints do, where
naively attempting to move the literal pattern out to a local variable for
naming clarity turns the value checking literal pattern into a name binding
capture pattern::
# PEP 634's literal pattern syntax / this PEP's literal constraint syntax
match expr:
case {"port": 443}:
... # Handle the case where 'expr["port"] == 443'
case _:
... # Handle the non-matching case
HTTPS_PORT = 443
match expr:
case {"port": HTTPS_PORT}:
... # Matches any mapping with "port", binding its value to HTTPS_PORT
case _:
... # Handle the non-matching case
With explicit equality constraints, this style of refactoring keeps the original
semantics (just as it would for a value lookup in any other statement)::
# This PEP's equality constraints
match expr:
case {"port": == 443}:
... # Handle the case where 'expr["port"] == 443'
case _:
... # Handle the non-matching case
HTTPS_PORT = 443
match expr:
case {"port": == HTTPS_PORT}:
... # Handle the case where 'expr["port"] == 443'
case _:
... # Handle the non-matching case
As noted above, both literal patterns and value patterns made their return (in
the form of inferred equality constraints) as a way to address the verbosity
problem of offering explicit ``==`` prefixed equality constraints as the *only*
way to express equality checks.
However, the presence of the explicit constraint nodes in the AST means that
these special cases can be limited to the parser, with the implicit forms
emitting the same AST nodes as their explicit counterparts.
Inferring equality constraints for f-strings
--------------------------------------------
This is less a design decision in its own right, and more a consequence of
other design decisions:
* the tokeniser and parser don't distinquish f-strings from other kinds of
strings, so inferring an explicit equality constraint for f-strings happens
by default when defining the match pattern parser rule for string literals
* the rest of the compiler then treats that output like any other explicit
equality constraint in an AST pattern node (i.e. allowing arbitary
expressions)
This combination of factors makes it awkward to implement a special case that
disallows inferring equality constraints for f-strings while accepting them for
string literals, so the PEP instead opts to just allow them (as they're just as
syntactically unambiguous as any other string in a match pattern).
Keeping inferred identity constraints
-------------------------------------
PEP 635 makes a reasonable case that interpreting a check against ``None``
as ``== None`` would almost always be incorrect, whereas interpreting it as
``is None`` (as advised in PEP 8) would almost always be what the user intended.
Similar reasoning applies to checking against ``...``.
Accordingly, this PEP defines the use of either of these tokens as implying an
identity constraint.
However, as with inferred equality contraints, inferred identity constraints
become explicit identity constraints in the parser output.
Disallowing inferred constraints for ``True`` and ``False``
-----------------------------------------------------------
PEP 635 makes a reasonable case that comparing the ``True``, and ``False``
literals by equality by default is problematic. PEP 8 advises against writing
those comparisons out explicitly in code, so it doesn't make sense for us to
implement a construct that does so implicitly inside the interpreter.
Unlike PEP 635, however, this PEP proposes to resolve the discrepancy by leaving
these two names out of the initial iteration of the inferred constraint syntax
definition entirely, rather than treating them as implying an identity constraint.
This means comparisons against ``True`` and ``False`` in match patterns would
need to be written in one of the following forms:
* comparison by numeric value::
case 0:
...
case 1:
...
* comparison by equality (equivalent to comparison by numeric value)::
case == False:
...
case == True:
...
* comparison by identity::
case is False:
...
case is True:
...
* comparison by value with class check (equivalent to comparison by identity)::
case bool(False):
...
case bool(True):
...
* comparison by boolean coercion::
case (x, p) if not p:
...
case (x, p) if p:
...
The last approach is the one that would most closely follow PEP 8's guidance
for ``if``-``elif`` chains (comparing by boolean coercion), but it's far from
clear at this point how ``True`` and ``False`` literals will end up being used
in pattern matching use cases.
In particular, PEP 635's assessment that users will *probably* mean "comparison
by value with class check", which effectively becomes "comparison by identity"
due to ``True`` and ``False`` being singletons, is a genuinely plausible
suggestion.
However, rather than attempting to guess up front, this PEP proposes that no
shorthand form be offered for these two constants in the initial implementation,
and we instead wait and see if a clearly preferred meaning emerges from actual
usage of the new construct.
Inferred constraints rather than implied constraints
----------------------------------------------------
This PEP uses the term "inferred contraint" to make it clear that the parser
is making assumptions about the user's intent when converting an inferred
constraint to an explicit one.
Calling them "implied constraints" instead would also be reasonable, but that
phrasing has a slightly stronger connotation that the inference is always going
to be correct, and one of the motivations of this PEP is that the inference
*isn't* always going to be correct, so we should be offering a way for users to
be explicit when the parser's assumptions don't align with their intent.
Deferred Ideas
==============
Allowing negated constraints in match patterns
----------------------------------------------
The requirement that constraint expressions be primary expressions means that
it isn't permitted to write ``!= expr`` or ``is not expr``.
Both of these forms have clear potential interpretions as a negated equality
constraint (i.e. ``x != expr``) and a negated identity constraint
(i.e. ``x is not expr``).
However, it's far from clear either form would come up often enough to justify
the dedicated syntax, so the extension has been deferred pending further
community experience with match statements.
Allowing containment checks in match patterns
---------------------------------------------
The syntax used for equality and identity constraints would be straightforward
to extend to containment checks: ``in container``.
One downside of the proposals in both this PEP and PEP 634 is that checking
for multiple values in the same case doesn't look like any existing set
membership check in Python::
# PEP 634's literal patterns / this PEP's inferred constraints
match value:
case 0 | 1 | 2 | 3:
...
Explicit equality constraints also become quite verbose if they need to be
repeated::
match value:
case == one | == two | == three | == four:
...
Containment constraints would provide a more concise way to check if the
match subject was present in a container::
match value:
case in {0, 1, 2, 3}:
...
case in {one, two, three, four}:
...
case in range(4): # It would accept any container, not just literal sets
...
Such a feature would also be readily extensible to allow all kinds of case
clauses without any further syntax updates, simply by defining ``__contains__``
appropriately on a custom class definition.
However, while this does seem like a useful extension, it isn't essential to
making match statements a valuable addition to the language, so it seems more
appropriate to defer it to a separate proposal, rather than including it here.
Rejected Ideas
==============
Restricting permitted expressions in constraint patterns and mapping pattern keys
---------------------------------------------------------------------------------
While it's entirely technically possible to restrict the kinds of expressions
permitted in constraint patterns and mapping pattern keys to just attribute
lookups and constant literals (as PEP 634 does), there isn't any clear runtime
value in doing so, so this PEP proposes allowing any kind of primary expression
(primary expressions are an existing node type in the grammar that includes
things like literals, names, attribute lookups, function calls, container
subscripts, parenthesised groups, etc).
While PEP 635 does emphasise several times that literal patterns and value
patterns are not full expressions, it doesn't ever articulate a concrete benefit
that is obtained from that restriction (just a theoretical appeal to it being
useful to separate static checks from dynamic checks, which a code style
tool could still enforce, even if the compiler itself is more permissive).
The last time we imposed such a restriction was for decorator expressions and
the primary outcome of that was that users had to put up with years of awkward
syntactic workarounds (like nesting arbitrary expressions inside function calls
that just returned their argument) to express the behaviour they wanted before
the language definition was finally updated to allow arbitrary expressions and
let users make their own decisions about readability.
The situation in PEP 634 that bears a resemblance to the situation with decorator
expressions is that arbitrary expressions are technically supported in value
patterns, they just require awkward workarounds where either all the values to
match need to be specified in a helper class that is placed before the match
statement::
# Allowing arbitrary match targets with PEP 634's value pattern syntax
class mt:
value = func()
match expr:
case (_, mt.value):
... # Handle the case where 'expr[1] == func()'
Or else they need to be written as a combination of a capture pattern and a
guard expression::
match expr:
case (_, _matched) if _matched == func():
... # Handle the case where 'expr[1] == func()'
This PEP proposes skipping requiring any such workarounds, and instead
supporting arbitrary value constraints from the start::
match expr:
case (__, == func()):
... # Handle the case where 'expr == func()'
Whether actually writing that kind of code is a good idea would be a topic for
style guides and code linters, not the language compiler.
In particular, if static analysers can't follow certain kinds of dynamic checks,
then they can limit the permitted expressions at analysis time, rather than the
compiler restricting them at compile time.
There are also some kinds of expressions that are almost certain to give
nonsensical results (e.g. ``yield``, ``yield from``, ``await``) due to the
pattern caching rule, where the number of times the constraint expression
actually gets evaluated will be implementation dependent. Even here, the PEP
takes the view of letting users write nonsense if they really want to.
Aside from the recenty updated decorator expressions, another situation where
Python's formal syntax offers full freedom of expression that is almost never
used in practice is in ``except`` clauses: the exceptions to match against
almost always take the form of a simple name, a dotted name, or a tuple of
those, but the language grammar permits arbitrary expressions at that point.
This is a good indication that Python's user base can be trusted to
take responsibility for finding readable ways to use permissive language
features, by avoiding writing hard to read constructs even when they're
permitted by the compiler.
This permissiveness comes with a real concrete benefit on the implementation
side: dozens of lines of match statement specific code in the compiler is
replaced by simple calls to the existing code for compiling expressions
(including in the AST validation pass, the AST optimization pass, the symbol
table analysis pass, and the code generation pass). This implementation
benefit would accrue not just to CPython, but to every other Python
implementation looking to add match statement support.
Requiring the use of constraint prefix markers for mapping pattern keys
-----------------------------------------------------------------------
The initial (unpublished) draft of this proposal suggested requiring mapping
pattern keys be constraint patterns, just as PEP 634 requires that they be valid
literal or value patterns::
import constants
match config:
case {?"route": route}:
process_route(route)
case {?constants.DEFAULT_PORT: sub_config, **rest}:
process_config(sub_config, rest)
However, the extra character was syntactically noisy and unlike its use in
constraint patterns (where it distinguishes them from capture patterns), the
prefix doesn't provide any additional information here that isn't already
conveyed by the expression's position as a key within a mapping pattern.
Accordingly, the proposal was simplified to omit the marker prefix from mapping
pattern keys.
This omission also aligns with the fact that containers may incorporate both
identity and equality checks into their lookup process - they don't purely
rely on equality checks, as would be incorrectly implied by the use of the
equality constraint prefix.
Providing dedicated syntax for binding matched constraint values
----------------------------------------------------------------
The initial (unpublished) draft of this proposal suggested allowing ``NAME?EXPR``
as a syntactically unambiguous shorthand for PEP 622's ``NAME := BASE.ATTR`` or
PEP 634's ``BASE.ATTR as NAME``.
This idea was dropped as it complicated the grammar for no gain in
expressiveness over just using the general purpose approach to combining
capture patterns with other match patterns (i.e. ``?EXPR as NAME`` at the
time, ``== EXPR as NAME`` now) when the identity of the matching object is
important.
This idea is even less appropriate after the switch to using existing comparison
operators as the marker prefix, as both ``NAME == EXPR`` and ``NAME is EXPR``
would look like ordinary comparison operations, with nothing to suggest that
``NAME`` is being bound by the pattern matching process.
Reference Implementation
========================
A reference implementation for this PEP [3_] has been derived from Brandt
Bucher's reference implementation for PEP 634 [4_].
Relative to the text of this PEP, the draft reference implementation currently
implements the variant of mapping patterns where numeric constraints are
accepted in addition to primary expressions (this allowed the PEP 634 mapping
pattern checks for complex keys to run as written).
All other modified patterns have been updated to follow this PEP rather than
PEP 634.
The AST validator for match patterns has not yet been implemented.
There is an implementation decision still to be made around representing
constraint operators in the AST. The draft implementation adds them as new
cases on the existing ``UnaryOp`` node, but there's an argument to be made that
they would be better implemented as a new ``Constraint`` node, since they're
accepted at different points in the syntax tree than other unary operators.
Acknowledgments
===============
The PEP 622 and PEP 634/635/636 authors, as the proposal in this PEP is merely
an attempt to improve the readability of an already well-constructed idea by
proposing that reusing the existing attribute binding syntax to mean an
attribute lookup will be more easily understood as syntactic sugar for a more
explicit underlying expression that's compatible with the existing binding
target syntax than it will be as the *only* way to spell such comparisons in
match patterns.
Steven D'Aprano, who made a convincing case that the key goals of this PEP could
be achieved by using existing comparison tokens to add the ability to override
the compiler when our guesses as to "what most users will want most of the time"
are inevitably incorrect for at least some users some of the time, and retaining
some of PEP 634's syntactic sugar (with a slightly different semantic definition)
to obtain the same level of brevity as PEP 634 in most situations. (Paul
Sokolosvsky also independently suggested using ``==`` instead of ``?`` as a
more easily understood prefix for equality constraints).
Thomas Wouters, whose publication of PEP 640 and public review of the structured
pattern matching proposals persuaded the author of this PEP to continue
advocating for a wildcard pattern syntax that a future PEP could plausibly turn
into a hard keyword that always skips binding a reference in any location a
simple name is expected, rather than continuing indefinitely as the match
pattern specific soft keyword that is proposed here.
References
==========
.. [1] Post explaining the syntactic novelties in PEP 622
https://mail.python.org/archives/list/python-dev@python.org/message/2VRPDW4EE243QT3QNNCO7XFZYZGIY6N3/>
.. [2] Declined pull request proposing to list this as a Rejected Idea in PEP 622
https://github.com/python/peps/pull/1564
.. [3] In-progress reference implementation for this PEP
https://github.com/ncoghlan/cpython/tree/pep-642-constraint-patterns
.. [4] PEP 634 reference implementation
https://github.com/python/cpython/pull/22917
.. [5] Steven D'Aprano's cogent criticism of the first published iteration of this PEP
https://mail.python.org/archives/list/python-dev@python.org/message/BTHFWG6MWLHALOD6CHTUFPHAR65YN6BP/
.. [6] Thomas Wouter's initial review of the structured pattern matching proposals
https://mail.python.org/archives/list/python-dev@python.org/thread/4SBR3J5IQUYE752KR7C6432HNBSYKC5X/
.. [7] Stack Overflow answer regarding the use cases for ``_`` as an identifier
https://stackoverflow.com/questions/5893163/what-is-the-purpose-of-the-single-underscore-variable-in-python/5893946#5893946
.. _Appendix A:
Appendix A -- Full Grammar
==========================
Here is the full modified grammar for ``match_stmt``, replacing Appendix A
in PEP 634.
Notation used beyond standard EBNF is as per PEP 534:
- ``'KWD'`` denotes a hard keyword
- ``"KWD"`` denotes a soft keyword
- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*``
- ``!RULE`` is a negative lookahead assertion
::
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
subject_expr:
| star_named_expression ',' [star_named_expressions]
| named_expression
case_block: "case" patterns [guard] ':' block
guard: 'if' named_expression
patterns: open_sequence_pattern | pattern
pattern: as_pattern | or_pattern
as_pattern: or_pattern 'as' capture_pattern
or_pattern: '|'.closed_pattern+
closed_pattern:
| capture_pattern
| wildcard_pattern
| constraint_pattern
| inferred_constraint_pattern
| group_pattern
| sequence_pattern
| mapping_pattern
| class_pattern
capture_pattern: !"__" NAME !('.' | '(' | '=')
wildcard_pattern: "__"
constraint_pattern:
| eq_constraint
| id_constraint
eq_constraint: '==' primary
id_constraint: 'is' primary
inferred_constraint_pattern:
| inferred_id_constraint
| inferred_eq_constraint
inferred_id_constraint[expr_ty]:
| 'None'
| '...'
inferred_eq_constraint:
| attr_constraint
| numeric_constraint
| strings
attr_constraint: attr !('.' | '(' | '=')
attr: name_or_attr '.' NAME
name_or_attr: attr | NAME
numeric_constraint:
| signed_number !('+' | '-')
| signed_number '+' NUMBER
| signed_number '-' NUMBER
signed_number: NUMBER | '-' NUMBER
group_pattern: '(' pattern ')'
sequence_pattern:
| '[' [maybe_sequence_pattern] ']'
| '(' [open_sequence_pattern] ')'
open_sequence_pattern: maybe_star_pattern ',' [maybe_sequence_pattern]
maybe_sequence_pattern: ','.maybe_star_pattern+ ','?
maybe_star_pattern: star_pattern | pattern
star_pattern: '*' (capture_pattern | wildcard_pattern)
mapping_pattern: '{' [items_pattern] '}'
items_pattern: ','.key_value_pattern+ ','?
key_value_pattern:
| primary ':' pattern
| double_star_pattern
double_star_pattern: '**' capture_pattern
class_pattern:
| name_or_attr '(' [pattern_arguments ','?] ')'
pattern_arguments:
| positional_patterns [',' keyword_patterns]
| keyword_patterns
positional_patterns: ','.pattern+
keyword_patterns: ','.keyword_pattern+
keyword_pattern: NAME '=' pattern
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: