python-peps/pep-0642.rst

991 lines
41 KiB
ReStructuredText
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 642
Title: Constraint Pattern Syntax for Structural Pattern Matching
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
BDFL-Delegate:
Discussions-To: Python-Dev <python-dev@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Requires: 634
Created: 26-Sep-2020
Python-Version: 3.10
Post-History: 31-Oct-2020
Resolution:
Abstract
========
This PEP covers an alternative syntax proposal for PEP 634's structural pattern
matching that explicitly anchors match expressions in the existing syntax for
assignment targets, while retaining most semantic aspects of the existing
proposal.
Specifically, this PEP adopts an additional design restriction that PEP 634's
authors considered unreasonable: that any syntax that is common to both
assignment targets and match patterns must have a comparable semantic effect,
while any novel match pattern semantics must use syntax which emits a syntax
error when used in an assignment target.
As a consequence, this PEP proposes the following changes to the proposed match
pattern syntax:
* Literal patterns and value patterns are combined into a single new
pattern type: "constraint patterns"
* Constraint patterns are either equality constraints or identity constraints
* Equality constraints use ``?`` as a prefix marker on an otherwise
arbitrary primary expression: ``?EXPR``
* Identity constraints use `?is` as a prefix marker on an otherwise
arbitrary primary expression: ``?is EXPR``
* There is no special casing of the ``None``, ``True``, or ``False`` literals
* The constraint expression in an equality constraint may be omitted to give a
non-binding wildcard pattern
* Mapping patterns change to allow arbitrary primary expressions as keys
* Attempting to use a dotted name as a match pattern is a syntax error rather
than implying an equality constraint
* Attempting to use a literal as a match pattern is a syntax error rather
than implying an equality or identity constraint
* The ``_`` identifier is no longer syntactically special (it is a normal
capture pattern, just as it is an ordinary assignment target)
Relationship with other PEPs
============================
This PEP both depends on and competes with PEP 634 - the PEP author agrees that
match statements would be a sufficiently valuable addition to the language to
be worth the additional complexity that they add to the learning process, but
disagrees with the idea that "simple name vs literal or attribute lookup" offers
an adequate syntactic distinction between name binding and value lookup
operations in match patterns.
By switching the wildcard pattern to "?", this PEP complements the proposal in
PEP 640 to allow the use of wildcard patterns in other contexts where a name
binding is syntactically required, but the application doesn't actually need
the value.
Motivation
==========
The original PEP 622 (which was later split into PEPs 634, 635, and 636)
incorporated an unstated but essential assumption in its syntax design: that
neither ordinary expressions *nor* the existing assignment target syntax provide
an adequate foundation for the syntax used in match patterns.
While the PEP didn't explicitly state this assumption, one of the PEP authors
explained it clearly on python-dev [1_]:
The actual problem that I see is that we have different cultures/intuitions
fundamentally clashing here. In particular, so many programmers welcome
pattern matching as an "extended switch statement" and find it therefore
strange that names are binding and not expressions for comparison. Others
argue that it is at odds with current assignment statements, say, and
question why dotted names are _/not/_ binding. What all groups seem to
have in common, though, is that they refer to _/their/_ understanding and
interpretation of the new match statement as 'consistent' or 'intuitive'
--- naturally pointing out where we as PEP authors went wrong with our
design.
But here is the catch: at least in the Python world, pattern matching as
proposed by this PEP is an unprecedented and new way of approaching a common
problem. It is not simply an extension of something already there. Even
worse: while designing the PEP we found that no matter from which angle you
approach it, you will run into issues of seeming 'inconsistencies' (which is
to say that pattern matching cannot be reduced to a 'linear' extension of
existing features in a meaningful way): there is always something that goes
fundamentally beyond what is already there in Python. That's why I argue
that arguments based on what is 'intuitive' or 'consistent' just do not
make sense _/in this case/_.
PEP 635 (and PEP 622 before it) makes a strong case that treating capture
patterns as the default usage for simple names in match patterns is the right
approach, and provides a number of examples where having names express value
constraints by default would be confusing (this difference from C/C++ switch
statement semantics is also a key reason it makes sense to use `match` as the
introductory keyword for the new statement rather than `switch`).
However, PEP 635 doesn't even *try* to make the case for the second assertion,
that treating match patterns as a variation on assignment targets also leads to
inherent contradictions. Even a PR submitted to explicitly list this option in
the "Rejected Ideas" section of the original PEP 622 was declined [2_].
This PEP instead starts from the assumption that it *is* possible to treat match
patterns as a variation on assignment targets, and the only essential
differences that emerge relative to the syntactic proposal in PEP 634 are:
* a requirement to use an explicit marker prefix on value lookups rather than
allowing them to be implied by the use of dotted names; and
* a requirement to use a non-binding wildcard marker other than ``_``.
PEP 634 also proposes special casing the literals ``None``, ``True``, and
``False`` so that they're compared by identity when written directly as a
literal pattern, but by equality when referenced by a value pattern. This PEP
eliminates those special cases by proposing distinct syntax for matching by
identity and matching by equality.
Specification
=============
This PEP retains the overall `match`/`case` statement syntax from PEP 634, and
retains both the syntax and semantics for the following match pattern variants:
* capture patterns
* class patterns
* group patterns
* sequence patterns
Pattern combination (both OR and AS patterns) and guard expressions also remain
the same as they are in PEP 634.
Wildcard patterns change their syntactic marker from `_` to `?`.
Literal patterns and value patterns are replaced by constraint
patterns.
Mapping patterns change to allow arbitrary primary expressions for keys, rather
than being restricted to literal patterns or value patterns.
Wildcard patterns
-----------------
Wildcard patterns change their syntactic marker from `_` to `?`::
# Wildcard pattern
match data:
case [?, ?]:
print("Some pair")
print(?) # Error!
With `?` taking over the role of the non-binding syntactically significant
wildcard marker, `_` reverts to working the same way it does in other assignment
contexts: it operates as an ordinary identifier and hence becomes a normal
capture pattern rather than a special case.
Constraint patterns
-------------------
Constraint patterns use the following simplified syntax::
constraint_pattern: id_constraint | eq_constraint
id_constraint: '?' 'is' primary
eq_constraint: '?' primary
The constraint expression is an arbitrary primary expression - it can be a
simple name, a dotted name lookup, a literal, a function call, or any other
primary expression.
While the compiler would allow whitespace between ``?`` and ``is`` in
identity constraints (as they're defined as separate tokens), this PEP
proposes that PEP 8 be updated to recommend writing them like ``?is``, as if
they were a combined unary operator.
If this PEP were to be adopted in preference to PEP 634, then all literal and
value patterns would instead be written as constraint patterns::
# Literal patterns
match number:
case ?0:
print("Nothing")
case ?1:
print("Just one")
case ?2:
print("A couple")
case ?-1:
print("One less than nothing")
case ?(1-1j):
print("Good luck with that...")
# Additional literal patterns
match value:
case ?True:
print("True or 1")
case ?False:
print("False or 0")
case ?None:
print("None")
case ?"Hello":
print("Text 'Hello'")
case ?b"World!":
print("Binary 'World!'")
case ?...:
print("May be useful when writing __getitem__ methods?")
# Matching by identity rather than equality
SENTINEL = object()
match value:
case ?is True:
print("True, not 1")
case ?is False:
print("False, not 0")
case ?is None:
print("None, following PEP 8 comparison guidelines")
case ?is SENTINEL:
print("Matches the sentinel by identity, not just value")
# Constant value patterns
from enum import Enum
class Sides(str, Enum):
SPAM = "Spam"
EGGS = "eggs"
...
preferred_side = Sides.EGGS
match entree[-1]:
case ?Sides.SPAM: # Compares entree[-1] == Sides.SPAM.
response = "Have you got anything without Spam?"
case ?preferred_side: # Compares entree[-1] == preferred_side
response = f"Oh, I love {preferred_side}!"
case side: # Assigns side = entree[-1].
response = f"Well, could I have their Spam instead of the {side} then?"
Note the `?preferred_side` example: using an explicit prefix marker on constraint
expressions removes the restriction to only working with bound names for value
lookups. The `?(1-1j)` example illustrates the use of parentheses to turn any
subexpression into an atomic one.
This PEP retains the caching property specified for value patterns in PEP 634:
if a particular constraint pattern occurs more than once in a given match
statement, language implementations are explicitly permitted to cache the first
calculation on any given match statement execution and re-use it in other
clauses. (This implicit caching is less necessary in this PEP, given that
explicit local variable caching becomes a valid option, but it still seems a
useful property to preserve)
Mapping patterns
----------------
Mapping patterns inherit the change to replace literal patterns and constant
value patterns with constraint patterns::
mapping_pattern: '{' [items_pattern] '}'
items_pattern: ','.key_value_pattern+ ','?
key_value_pattern:
| primary ':' or_pattern
| '**' capture_pattern
However, the constraint marker prefix is not needed in this case, as the fact
this is a key to be looked up rather than a name to be bound is already
implied by its position within a mapping pattern.
This means that in simple cases, mapping patterns look exactly as they do in
PEP 634::
import constants
match config:
case {"route": route}:
process_route(route)
case {constants.DEFAULT_PORT: sub_config, **rest}:
process_config(sub_config, rest)
Unlike PEP 634, however, ordinary local and global variables can also be used
to match mapping keys::
ROUTE_KEY="route"
ADDRESS_KEY="local_address"
PORT_KEY="port"
match config:
case {ROUTE_KEY: route}:
process_route(route)
case {ADDRESS_KEY: address, PORT_KEY: port}:
process_address(address, port)
Note: as complex literals are written as binary operations that are evaluated
at compile time, this PEP requires that they be written in parentheses when
used as a key in a mapping pattern.
Design Discussion
=================
Treating match pattern syntax as an extension of assignment target syntax
-------------------------------------------------------------------------
PEP 634 already draws inspiration from assignment target syntax in the design
of its sequence pattern matching - while being restricted to sequences for
performance and runtime correctness reasons, sequence patterns are otherwise
very similar to the existing iterable unpacking and tuple packing features seen
in regular assignment statements and function signature declarations.
By requiring that any new semantics introduced by match patterns be given new
syntax that is currently disallowed in assignment targets, one of the goals of
this PEP is to explicitly leave the door open to one or more future PEPs that
enhance assignment target syntax to support some of the new features introduced
by match patterns.
In particular, being able to easily deconstruct mappings into local variables
seems likely to be generally useful, even when there's only one mapping variant
to be matched::
{"host": host, "port": port, "mode": ?"TCP"} = settings
While such code could already be written using a match statement (assuming
either this PEP or PEP 634 were to be accepted into the language), an
assignment statement level variant should be able to provide standardised
exceptions for cases where the right hand side either wasn't a mapping (throwing
`TypeError`), didn't have the specified keys (throwing `KeyError`), or didn't
have the specific values for the given keys (throwing `ValueError`), avoiding
the need to write out that exception raising logic in every case.
PEP 635 raises the concern that enough aspects of pattern matching semantics
will differ from assignment target semantics that pursuing syntactic parallels
will end up creating confusion rather than reducing it. However, the primary
examples cited as potentially causing confusion are exactly those where the
PEP 634 syntax is *already* the same as that for assignment targets: the fact
that case patterns use iterable unpacking syntax, but only match on sequences
(and specifically exclude strings and byte-strings) rather than consuming
arbitrary iterables is an aspect of PEP 634 that this PEP leaves unchanged.
These semantic differences are intrinsic to the nature of pattern matching:
whereas it is reasonable for a one-shot assignment statement to consume a
one-shot iterator, it isn't reasonable to do that in a construct that's
explicitly about matching a given value against multiple potential targets,
making full use of the available runtime type information to ensure those checks
are as side effect free as possible.
It's an entirely orthogonal question to how the distinction is drawn between
capture patterns and patterns that check for expected values (constraint
patterns in this PEP, literal and value patterns in PEP 634), and it's a big
logical leap to take from "these specific semantic differences between iterable
unpacking and sequence matching are needed in order to handle checking against
multiple potential targets" to "we can reuse attribute binding syntax to mean
equality constraints instead and nobody is going to get confused by that".
Interaction with caching of attribute lookups in local variables
----------------------------------------------------------------
The major change between this PEP and PEP 634 is the use of `?EXPR` for value
constraint lookups, rather than ``NAME.ATTR``. The main motivation for this is
to avoid the semantic conflict with regular assignment targets, where
``NAME.ATTR`` is already used in assignment statements to set attributes.
However, even within match statements themselves, the ``name.attr`` syntax for
value patterns has an undesirable interaction with local variable assignment,
where routine refactorings that would be semantically neutral for any other
Python statement introduce a major semantic change when applied to a match
statement.
Consider the following code::
while value < self.limit:
... # Some code that adjusts "value"
The attribute lookup can be safely lifted out of the loop and only performed
once::
_limit = self.limit:
while value < _limit:
... # Some code that adjusts "value"
With the marker prefix based syntax proposal in this PEP, constraint patterns
would be similarly tolerant of match patterns being refactored to use a local
variable instead of an attribute lookup, with the following two statements
being functionally equivalent::
match expr:
case {"key": ?self.target}:
... # Handle the case where 'expr["key"] == self.target'
case ?:
... # Handle the non-matching case
_target = self.target
match expr:
case {"key": ?_target}:
... # Handle the case where 'expr["key"] == self.target'
case ?:
... # Handle the non-matching case
By contrast, PEP 634's attribution of additional semantic significance to the
use of attribute lookup notation means that the following two statements
wouldn't be equivalent at all::
# PEP 634's value pattern syntax
match expr:
case {"key": self.target}:
... # Handle the case where 'expr["key"] == self.target'
case _:
... # Handle the non-matching case
_target = self.target
match expr:
case {"key": _target}:
... # Matches any mapping with "key", binding its value to _target
case _:
... # Handle the non-matching case
To be completely clear, the latter statement means the same under this PEP as it
does under PEP 634. The difference is that PEP 634 is relying entirely on the
dotted attribute lookup syntax to identify value patterns, so when the attribute
lookup gets removed, the pattern type immediately changes from a value pattern
to a capture pattern.
By contrast, the explicit marker prefix on constraint patterns in this PEP means
that switching from a dotted lookup to a local variable lookup has no effect on
the kind of pattern that the compiler detects - to change it to a capture
pattern, you have to explicitly remove the marker prefix (which will result in
a syntax error if the binding target isn't a simple name).
PEP 622's walrus pattern syntax had another odd interaction where it might not
bind the same object as the exact same walrus expression in the body of the
case clause, but PEP 634 fixed that discrepancy by replacing walrus patterns
with AS patterns (where the fact that the value bound to the name on the RHS
might not be the same value as returned by the LHS is a standard feature common
to all uses of the "as" keyword).
Using "?" as the constraint pattern prefix
------------------------------------------
If the need for a dedicated constraint pattern prefix is accepted, then the
next question is to ask exactly what that prefix should be.
With multiple constraint patterns potentially appearing inside larger
structural patterns, using a single punctuation character rather than a keyword
is desirable for brevity.
Most potential candidates are already used in Python for another unrelated
purpose, or would integrate poorly with other aspects of the pattern matching
syntax (e.g. ``=`` or ``==`` have multiple problems along those lines, in particular
in the way they would combine with ``=`` as a keyword separator in class
patterns, or ``:`` as a key/value separate in mapping patterns).
This PEP proposes ``?`` as the prefix marker as it isn't currently used in Python's
core syntax, the proposed usage as a prefix marker won't conflict with its
use in other Python related contexts (e.g. looking up object help information in
IPython), and there are plausible mnemonics that may help users to *remember*
what the syntax means even if they can't guess the semantics if exposed to it
without any explanation (mostly that it's a shorthand for the question "Is the
unpacked value at this position equivalent to the value given by the expression?
If not, don't match")).
PEP 635 has a good discussion of the problems with this choice in the context
of using it as the wildcard pattern marker:
An alternative that does not suggest an arbitrary number of items would
be ``?``. This is even being proposed independently from pattern matching in
PEP 640. We feel however that using ``?`` as a special "assignment" target is
likely more confusing to Python users than using ``_``. It violates Python's
(admittedly vague) principle of using punctuation characters only in ways
similar to how they are used in common English usage or in high school math,
unless the usage is very well established in other programming languages
(like, e.g., using a dot for member access).
The question mark fails on both counts: its use in other programming
languages is a grab-bag of usages only vaguely suggested by the idea of a
"question". For example, it means "any character" in shell globbing,
"maybe" in regular expressions, "conditional expression" in C and many
C-derived languages, "predicate function" in Scheme,
"modify error handling" in Rust, "optional argument" and "optional chaining"
in TypeScript (the latter meaning has also been proposed for Python by
PEP 505). An as yet unnamed PEP proposes it to mark optional types,
e.g. int?.
Another common use of ``?`` in programming systems is "help", for example, in
IPython and Jupyter Notebooks and many interactive command-line utilities.
This PEP takes the view that *not* requiring a marker prefix on value lookups
in match patterns results in a cure that is worse than the disease: Python's
first ever syntax-sensitive value lookup where you can't transparently
replace an attribute lookup with a local variable lookup and maintain semantic
equivalence aside from the exact relative timing of the attribute lookup.
Assuming the requirement for a marker prefix is accepted on those grounds, then
the syntactic bar to meet isn't "Can users *guess* what the chosen symbol means
without anyone ever explaining it to them?" but instead the lower standard
applied when choosing the ``@`` symbol for both decorator expressions and matrix
multiplication and the ``:=`` character combination for assignment expressions:
"Can users *remember* what it means once they've had it explained to them at
least once?".
This PEP contends that ``?`` will be able to pass that lower standard, and would
pass it even more readily if PEP 640 were also subsequently adopted to allow it
as a general purpose non-binding wildcard marker that doesn't conflict with the
use of ``_`` in application internationalisation use cases.
PEPs proposing additional meanings for this character would need to take the
pattern matching meaning into account, but wouldn't necessarily fail purely on
that account (e.g. ``@`` was adopted as a binary operator for matrix
multiplication well after its original adoption as a decorator expression
prefix). "Value checking" related use cases such as PEP 505's None-aware
operators would likely fare especially well on that front, but each such
proposal would continue to be judged on a case-by-case basis.
Using ``?`` as the wildcard pattern
-----------------------------------
PEP 635 makes a solid case that introducing ``?`` *solely* as a wildcard pattern
marker would be a bad idea. Continuing on from the text already quoted in the
previous section:
In addition, this would put Python in a rather unique position: The
underscore is used as a wildcard pattern in every programming language
with pattern matching that we could find (including C#, Elixir, Erlang,
F#, Grace, Haskell, Mathematica, OCaml, Ruby, Rust, Scala, Swift, and
Thorn). Keeping in mind that many users of Python also work with other
programming languages, have prior experience when learning Python, and
may move on to other languages after having learned Python, we find that
such well-established standards are important and relevant with respect
to readability and learnability. In our view, concerns that this wildcard
means that a regular name received special treatment are not strong enough
to introduce syntax that would make Python special.
Other languages with pattern matching don't use ``?`` as the wildcard pattern
(they all use ``_``), and without any other usage in Python's syntax, there
wouldn't be any useful prompts to help users remember what ``?`` means when
they encounter it in a match pattern.
In this PEP, the adoption of ``?`` as the wildcard pattern marker instead comes
from asking the question "What does it mean to omit the constraint expression
from a constraint pattern?", and concluding that "match any value" is a more
useful definition in most situations than reporting a syntax error.
That said, one possible modification to consider in the name of making code and
concepts easier to share with other languages would be to exempt ``_`` from the
"no repeated names" compiler check.
With that change, using ``_`` as a wildcard marker would *work* - it would just
also bind the ``_`` name, the same as it does in any other Python assignment
context.
No special casing for ``?None``, ``?True``, and ``?False``
----------------------------------------------------------
This PEP follows PEP 622 in treating ``None``, ``True`` and ``False`` like any other
value constraint, and comparing them by equality, rather than following PEP
634 in proposing that these literals (and only these literals) be handled specially
and compared via identity.
While writing ``x is None`` is a common (and PEP 8 recommended) practice, nobody
litters their ``if``-``elif`` chains with ``x is True`` or ``x is False`` expressions,
they write ``x`` and ``not x``, both of which compare by value, not identity.
Indeed, PEP 8 explicitly disallows the use ``if x is True:`` and ``if x is False:``,
preferring the forms without any comparison operator at all.
The key problem with special casing is that it doesn't interact properly with
Python's historical practice where "a reference is just a reference, it doesn't
matter how it is spelled in the code".
Instead, with the special casing proposed in PEP 634, checking against one of
these values directly would behave differently from checking against it when
saved in a variable or attribute::
# PEP 634's literal pattern syntax
match expr:
case True:
... # Only handles the case where "expr is True"
# PEP 634's value pattern syntax
match expr:
case self.expected_match: # Set to 'True' somewhere else
... # Handles the case where "expr == True"
By contrast, the explicit prefix syntax proposed in this PEP makes it
straightforward to include both equality constraints and identity constraints,
allowing users to specify directly in their case clauses whether they want to
match by identity or by value.
This distinction means that case clauses can even be used to provide a dedicated
code path for exact identity matches on arbitrary objects::
match value:
case ?is obj:
... # Handle being given the exact same object
case ?obj:
... # Handle being given an equivalent object
case ?:
... # Handle the non-matching case
Deferred Ideas
==============
Allowing negated constraints in match patterns
----------------------------------------------
The requirement that constraint expressions be primary expressions means that
it isn't permitted to write ``?not expr`` or ``?is not expr``.
Both of these forms have reasonably clear potential interpretions as a
negated equality constraint (i.e. ``x != expr``) and a negated identity
constraint (i.e. ``x is not expr``).
However, it's far from clear either form would come up often enough to justify
the dedicated syntax, so the extension has been deferred pending further
community experience with match statements.
Note: the compiler can't enforce the primary expression restriction when asked
to compile an AST tree directly, as parentheses used purely for grouping are
lost in the AST generation process. This means the permitted ``?(not expr)``
generates the same AST as the syntactically disallowed ``?not expr`` would.
That isn't a problem though, as in the hypothetical future where this feature
was implemented, ``?not expr`` wouldn't generate the same AST as ``?(not expr)``,
it would generate a new AST node that indicated the use of a negated eqaulity
constraint pattern.
Allowing containment checks in match patterns
---------------------------------------------
The syntax used for identity constraints would be straightforward to extend to
containment checks: ``?in container``.
One downside of the proposal in this PEP relative to PEP 634 is that checking
against multiple possible values becomes noticably more verbose, especially
for literal value checks::
# PEP 634 literal pattern
match value:
case 0 | 1 | 2 | 3:
...
# This PEP's equality constraints
match value:
case ?0 | ?1 | ?2 | ?3:
...
Containment constraints would provide a more concise way to check if the
match subject was present in a container::
match value:
case ?in {0, 1, 2, 3}:
...
case ?in range(4): # It would accept any container, not just literal sets
...
Such a feature would also be readily extensible to allow all kinds of case
clauses without any further syntax updates, simply by defining ``__contains__``
appropriately on a custom class definition.
However, while this does seem like a useful extension, it isn't essential, so
it seems more appropriate to defer it to a separate proposal, rather than
including it here.
Rejected Ideas
==============
Restricting permitted expressions in constraint patterns and mapping pattern keys
---------------------------------------------------------------------------------
While it's entirely technically possible to restrict the kinds of expressions
permitted in constraint patterns and mapping pattern keys to just attribute
lookups and constant literals (as PEP 634 does), there isn't any clear runtime
value in doing so, so this PEP proposes allowing any kind of primary expression
(primary expressions are an existing node type in the grammar that includes
things like literals, names, attribute lookups, function calls, container
subscripts, parenthesised groups, etc).
While PEP 635 does emphasise several times that literal patterns and value
patterns are not full expressions, it doesn't ever articulate a concrete benefit
that is obtained from that restriction (just a theoretical appeal to it being
useful to separate static checks from dynamic checks, which a code style
tool could still enforce, even if the compiler itself is more permissive).
The last time we imposed such a restriction was for decorator expressions and
the primary outcome of that was that users had to put up with years of awkward
syntactic workarounds (like nesting arbitrary expressions inside function calls
that just returned their argument) to express the behaviour they wanted before
the language definition was finally updated to allow arbitrary expressions and
let users make their own decisions about readability.
The situation in PEP 634 that bears a resemblance to the situation with decorator
expressions is that arbitrary expressions are technically supported in value
patterns, they just require awkward workarounds where either all the values to
match need to be specified in a helper class that is placed before the match
statement::
# Allowing arbitrary match targets with PEP 634's value pattern syntax
class mt:
value = func()
match expr:
case (?, mt.value):
... # Handle the case where 'expr[1] == func()'
Or else they need to be written as a combination of a capture pattern and a
guard expression::
match expr:
case (?, _matched) if _matched == func():
... # Handle the case where 'expr[1] == func()'
This PEP proposes skipping requiring any such workarounds, and instead
supporting arbitrary value constraints from the start::
match expr:
case ?func():
... # Handle the case where 'expr == func()'
Whether actually writing that kind of code is a good idea would be a topic for
style guides and code linters, not the language compiler.
In particular, if static analysers can't follow certain kinds of dynamic checks,
then they can limit the permitted expressions at analysis time, rather than the
compiler restricting them at compile time.
There are also some kinds of expressions that are almost certain to give
nonsensical results (e.g. ``yield``, ``yield from``, ``await``) due to the
pattern caching rule, where the number of times the constraint expression
actually gets evaluated will be implementation dependent. Even here, the PEP
takes the view of letting users write nonsense if they really want to.
Aside from the recenty updated decorator expressions, another situation where
Python's formal syntax offers full freedom of expression that is almost never
used in practice is in ``except`` clauses: the exceptions to match against
almost always take the form of a simple name, a dotted name, or a tuple of
those, but the language grammar permits arbitrary expressions at that point.
This is a good indication that Python's user base can be trusted to
take responsibility for finding readable ways to use permissive language
features, by avoiding writing hard to read constructs even when they're
permitted by the compiler.
This permissiveness comes with a real concrete benefit on the implementation
side: dozens of lines of match statement specific code in the compiler is
replaced by simple calls to the existing code for compiling expressions. This
implementation benefit would accrue not just to CPython, but to every other
Python implementation looking to add match statement support.
Keeping literal patterns
------------------------
An early (not widely publicised) draft of this proposal considered keeping
PEP 634's literal patterns, as they don't inherently conflict with assignment
statement syntax the way that PEP 634's value patterns do (trying to assign
to a literal is already a syntax error, whereas assigning to a dotted name
sets the attribute).
They were subsequently removed (replaced by the combination of equality and
identity constraints) due to the fact that they have the same syntax
sensitivity problem as value patterns do, where attempting to move the
literal pattern out to a local variable for naming clarity would turn the
value checking literal pattern into a name binding capture pattern::
# PEP 634's literal pattern syntax
match expr:
case {"port": 443}:
... # Handle the case where 'expr["port"] == 443'
case _:
... # Handle the non-matching case
HTTPS_PORT = 443
match expr:
case {"port": HTTPS_PORT}:
... # Matches any mapping with "port", binding its value to HTTPS_PORT
case _:
... # Handle the non-matching case
With equality constraints, this style of refactoring keeps the original
semantics (just as it would for a value lookup in any other statement)::
# This PEP's equality constraints
match expr:
case {"port": ?443}:
... # Handle the case where 'expr["port"] == 443'
case _:
... # Handle the non-matching case
HTTPS_PORT = 443
match expr:
case {"port": ?HTTPS_PORT}:
... # Handle the case where 'expr["port"] == 443'
case _:
... # Handle the non-matching case
Requiring the use of constraint prefix markers for mapping pattern keys
-----------------------------------------------------------------------
The initial (unpublished) draft of this proposal suggested requiring mapping
pattern keys be constraint patterns, just as PEP 634 requires that they be valid
literal or value patterns::
import constants
match config:
case {?"route": route}:
process_route(route)
case {?constants.DEFAULT_PORT: sub_config, **rest}:
process_config(sub_config, rest)
However, the extra character is syntactically noisy and unlike its use in
constraint patterns (where it distinguishes them from capture patterns), the
prefix doesn't provide any additional information here that isn't already
conveyed by the expression's position as a key within a mapping pattern.
Accordingly, the proposal was simplified to omit the marker prefix from mapping
pattern keys.
This omission also aligns with the fact that containers may incorporate both
identity and equality checks into their lookup process - they don't purely
rely on equality checks, as would be incorrectly implied by the use of the
equality constraint prefix.
Providing dedicated syntax for binding matched constraint values
----------------------------------------------------------------
The initial (unpublished) draft of this proposal suggested allowing ``NAME?EXPR``
as a syntactically unambiguous shorthand for PEP 622's ``NAME := BASE.ATTR`` or
PEP 634's ``BASE.ATTR as NAME``.
This idea was dropped as it complicated the grammar for no gain in
expressiveness over just using the general purpose approach to combining
capture patterns with other match patterns (i.e. ``?EXPR as NAME``) when the
identity of the matching object is important.
Reference Implementation
========================
A reference implementation for this PEP [3_] has been derived from Brandt
Bucher's reference implementation for PEP 634 [4_].
Relative to the text of this PEP, the draft reference implementation currently
retains literal patterns mostly as implemented for PEP 634, except that the
special casing of ``None``, ``True``, and ``False`` has been removed (with
``PEP 642 TODO`` notes added to the code that can be deleted once these patterns
are dropped entirely).
Value patterns, wildcard patterns, and mapping patterns have been updated
to follow this PEP rather than PEP 634.
Removing literal patterns will be a matter of deleting the code out of the
compiler, and then adding either ``?`` or ``?is`` as necessary to the test
cases that no longer compile. This removal isn't necessary to show that the
PEP's syntax proposal is feasible, so that work has been deferred for now.
There will also be an implementation decision to be made around representing
constraint operators in the AST. The draft implementation adds them as new
cases on the existing ``UnaryOp`` node, but it would potentially be better to
implement them as a new ``Constraint`` node, since they're accepted at
different points in the syntax tree than other unary operators.
Acknowledgments
===============
The PEP 622 and PEP 634/635/636 authors, as the proposal in this PEP is merely
an attempt to improve the readability of an already well-constructed idea by
proposing that one of the key new concepts in that proposal (the ability to
express value constraints in a name binding target) is sufficiently notable
to be worthy of using up one of the few remaining unused ASCII punctuation
characters in Python's syntax instead of reusing the existing attribute binding
syntax to mean an attribute lookup.
References
==========
.. [1] Post explaining the syntactic novelties in PEP 622
https://mail.python.org/archives/list/python-dev@python.org/message/2VRPDW4EE243QT3QNNCO7XFZYZGIY6N3/>
.. [2] Declined pull request proposing to list this as a Rejected Idea in PEP 622
https://github.com/python/peps/pull/1564
.. [3] In-progress reference implementation for this PEP
https://github.com/ncoghlan/cpython/tree/pep-642-constraint-patterns
.. [4] PEP 634 reference implementation
https://github.com/python/cpython/pull/22917
.. _Appendix A:
Appendix A -- Full Grammar
==========================
Here is the full modified grammar for ``match_stmt``, replacing Appendix A
in PEP 634.
Notation used beyond standard EBNF is as per PEP 534:
- ``'KWD'`` denotes a hard keyword
- ``"KWD"`` denotes a soft keyword
- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*``
- ``!RULE`` is a negative lookahead assertion
::
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
subject_expr:
| star_named_expression ',' [star_named_expressions]
| named_expression
case_block: "case" patterns [guard] ':' block
guard: 'if' named_expression
patterns: open_sequence_pattern | pattern
pattern: as_pattern | or_pattern
as_pattern: or_pattern 'as' capture_pattern
or_pattern: '|'.closed_pattern+
closed_pattern:
| capture_pattern
| constraint_pattern
| wildcard_pattern
| group_pattern
| sequence_pattern
| mapping_pattern
| class_pattern
capture_pattern: NAME !('.' | '(' | '=')
constraint_pattern: eq_constraint | id_constraint
id_constraint: '?' 'is' primary
eq_constraint: '?' primary
wildcard_pattern: '?'
group_pattern: '(' pattern ')'
sequence_pattern:
| '[' [maybe_sequence_pattern] ']'
| '(' [open_sequence_pattern] ')'
open_sequence_pattern: maybe_star_pattern ',' [maybe_sequence_pattern]
maybe_sequence_pattern: ','.maybe_star_pattern+ ','?
maybe_star_pattern: star_pattern | pattern
star_pattern: '*' (capture_pattern | wildcard_pattern)
mapping_pattern: '{' [items_pattern] '}'
items_pattern: ','.key_value_pattern+ ','?
key_value_pattern:
| primary ':' pattern
| double_star_pattern
double_star_pattern: '**' capture_pattern
class_pattern:
| name_or_attr '(' [pattern_arguments ','?] ')'
attr: name_or_attr '.' NAME
name_or_attr: attr | NAME
pattern_arguments:
| positional_patterns [',' keyword_patterns]
| keyword_patterns
positional_patterns: ','.pattern+
keyword_patterns: ','.keyword_pattern+
keyword_pattern: NAME '=' pattern
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: