PEP 653: Remove MATCH_POSITIONAL and make semantics for MATCH_DEFAULT near identical to PEP 634. (#1893)

* Use class in pattern for matching, rather than subject.

* Remove temporary dict when matching mapping patterns.

* Remove MATCH_POSITIONAL.
This commit is contained in:
Mark Shannon 2021-03-27 12:14:48 +00:00 committed by GitHub
parent be1c166f4b
commit 3686181865
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 140 additions and 226 deletions

View File

@ -14,21 +14,15 @@ Abstract
This PEP proposes a semantics for pattern matching that respects the general concept of PEP 634,
but is more precise, easier to reason about, and should be faster.
The object model will be extended with two special (dunder) attributes,
The object model will be extended with a special (dunder) attribute, ``__match_kind__``,
in addition to the ``__match_args__`` attribute from PEP 634, to support pattern matching.
* A ``__match_kind__`` attribute. Must be an integer.
* A ``__match_args__`` attribute. Only needed for those classes wanting to customize matching the class pattern.
If present, it must be a tuple of strings.
* A ``__deconstruct__()`` method. Only needed for customizing matching of class patterns with positional arguments.
Returns an iterable over the components of the deconstructed object.
The ``__match_kind__`` attribute must be an integer.
With this PEP:
* The semantics of pattern matching will be clearer, so that patterns are easier to reason about.
* It will be possible to implement pattern matching in a more efficient fashion.
* Pattern matching will be more usable for complex classes, by allowing classes more control over which patterns they match.
* Pattern matching will be more usable for complex classes, by allowing classes some more control over which patterns they match.
Motivation
==========
@ -48,35 +42,35 @@ Pattern matching in Python can be defined more precisely without loosing express
Improved control over class matching
------------------------------------
PEP 634 assumes that class instances are simply a collection of their attributes,
and that deconstruction by attribute access is the dual of construction. That is not true, as
many classes have a more complex relation between their constructor and internal attributes.
Those classes need to be able to define their own deconstruction.
PEP 634 delegates the decision over whether a class is a sequence or mapping to ``collections.abc``.
Not all classes that could be considered sequences are registered as subclasses of ``collections.abc.Sequence``.
This PEP allows them to match sequence patterns, without the full ``collections.abc.Sequence`` machinery.
For example, using ``sympy``, we might want to write::
# sin(x)**2 + cos(x)**2 == 1
case Add(Pow(sin(a), 2), Pow(cos(b), 2)) if a == b:
return 1
For ``sympy`` to support this pattern for PEP 634 would be possible, but tricky and cumbersome.
With this PEP it can be implemented easily [1]_.
PEP 634 also privileges some builtin classes with a special form of matching, the "self" match.
PEP 634 privileges some builtin classes with a special form of matching, the "self" match.
For example the pattern ``list(x)`` matches a list and assigns the list to ``x``.
By allowing classes to choose which kinds of pattern they match, other classes can use this form as well.
For example, using ``sympy``, we might want to write::
# a*a == a**2
case Mul(args=[a, b]) if a == b:
return Pow(a, 2)
Which requires the sympy class ``Symbol`` to "self" match.
For ``sympy`` to support this pattern with PEP 634 is possible, but a bit tricky.
With this PEP it can be implemented very easily [1]_.
Robustness
----------
With this PEP, access to attributes during pattern matching becomes well defined and deterministic.
This makes pattern matching less error prone when matching objects with hidden side effects, such as object-relational mappers.
Objects will have control over their own deconstruction, which can help prevent unintended consequences should attribute access have side-effects.
Objects will have more control over their own deconstruction, which can help prevent unintended consequences should attribute access have side-effects.
PEP 634 relies on the ``collections.abc`` module when determining which patterns a value can match, implicitly importing it if necessary.
This PEP will eliminate surprising import errors and misleading audit events from those imports.
Efficient implementation
------------------------
@ -106,13 +100,6 @@ A match statement performs a sequence of pattern matches. In general, matching a
To determine whether a value can match a particular kind of pattern, we add the ``__match_kind__`` attribute.
This allows the kind of a value to be determined once and in a efficient fashion.
To deconstruct an object, pre-existing special methods can be used for sequence and mapping patterns, but something new is needed for class patterns.
PEP 634 proposes using ad-hoc attribute access, disregarding the possibility of side-effects.
This could be problematic should the attributes of the object be dynamically created or consume resources.
By adding the ``__deconstruct__()`` method, objects can control how they are deconstructed,
and patterns with a different set of attributes can be efficiently rejected.
Should deconstruction of an object make no sense, then classes can define ``__match_kind__`` to reject class patterns completely.
Specification
=============
@ -133,7 +120,6 @@ bitwise ``or``\ ed with exactly one of these::
0
MATCH_DEFAULT
MATCH_POSITIONAL
MATCH_SELF
.. note::
@ -143,17 +129,13 @@ bitwise ``or``\ ed with exactly one of these::
Classes inheriting from ``object`` will inherit ``__match_kind__ = MATCH_DEFAULT`` and ``__match_args__ = ()``
Classes which define ``__match_kind__ & MATCH_POSITIONAL`` to be non-zero must
implement ``__deconstruct__()`` and should consider redefining ``__match_args__``.
* ``__match_args__``: should hold a tuple of strings indicating the names of attributes that are to be considered for matching; it may be empty for positional-only matches.
* ``__deconstruct__()``: should return a sequence which contains the parts of the deconstructed object.
If ``__match_args__`` is overridden, then it is required to hold a tuple of strings. It may be empty.
.. note::
``__match_args__`` will be automatically generated for dataclasses and named tuples, as specified in PEP 634.
The pattern matching implementation is *not* required to check that ``__match_args__`` and ``__deconstruct__`` behave as specified.
If the value of ``__match_args__`` or the result of ``__deconstruct__()`` is not as specified, then
The pattern matching implementation is *not* required to check that ``__match_args__`` behaves as specified.
If the value of ``__match_args__`` is not as specified, then
the implementation may raise any exception, or match the wrong pattern.
Of course, implementations are free to check these properties and provide meaningful error messages if they can do so efficiently.
@ -190,13 +172,6 @@ translates to::
$value = expr
$kind = type($value).__match_kind__
In addition some helper variables are initialized::
$list = None
$dict = None
$attrs = None
$items = None
Capture patterns
''''''''''''''''
@ -253,9 +228,6 @@ translates to::
Sequence Patterns
'''''''''''''''''
Before matching the first sequence pattern, but after checking that ``$value`` is a sequence,
``$value`` is converted to a list.
A pattern not including a star pattern::
case [$VARS]:
@ -264,11 +236,9 @@ translates to::
if $kind & MATCH_SEQUENCE == 0:
FAIL
if $list is None:
$list = list($value)
if len($list) != len($VARS):
if len($value) != len($VARS):
FAIL
$VARS = $list
$VARS = $value
Example: [2]_
@ -280,20 +250,15 @@ translates to::
if $kind & MATCH_SEQUENCE == 0:
FAIL
if $list is None:
$list = list($value)
if len($list) < len($VARS):
if len($value) < len($VARS):
FAIL
$VARS = $list # Note that $VARS includes a star expression.
$VARS = $value # Note that $VARS includes a star expression.
Example: [3]_
Mapping Patterns
''''''''''''''''
Before matching the first mapping pattern, but after checking that ``$value`` is a mapping,
``$value`` is converted to a ``dict``.
A pattern not including a double-star pattern::
case {$KEYWORD_PATTERNS}:
@ -302,13 +267,11 @@ translates to::
if $kind & MATCH_MAPPING == 0:
FAIL
if $dict is None:
$dict = dict($value)
if $dict.keys() != $KEYWORD_PATTERNS.keys():
if $value.keys() != $KEYWORD_PATTERNS.keys():
FAIL
# $KEYWORD_PATTERNS is a meta-variable mapping names to variables.
for $KEYWORD in $KEYWORD_PATTERNS:
$KEYWORD_PATTERNS[$KEYWORD] = $dict[QUOTE($KEYWORD)]
$KEYWORD_PATTERNS[$KEYWORD] = $value[QUOTE($KEYWORD)]
Example: [4]_
@ -320,12 +283,10 @@ translates to::
if $kind & MATCH_MAPPING == 0:
FAIL
if $dict is None:
$dict = dict($value)
if $dict.keys() not >= $KEYWORD_PATTERNS.keys():
if $value.keys() not >= $KEYWORD_PATTERNS.keys():
FAIL:
# $KEYWORD_PATTERNS is a meta-variable mapping names to variables.
$tmp = dict($dict)
$tmp = dict($value)
for $KEYWORD in $KEYWORD_PATTERNS:
$KEYWORD_PATTERNS[$KEYWORD] = $tmp.pop(QUOTE($KEYWORD))
$DOUBLE_STARRED_PATTERN = $tmp
@ -344,6 +305,12 @@ translates to::
if not isinstance($value, ClsName):
FAIL
.. note::
``case ClsName():`` is the only class pattern that can succeed if
``($kind & (MATCH_SELF|MATCH_DEFAULT)) == 0``
Class pattern with a single positional pattern::
case ClsName($VAR):
@ -366,18 +333,10 @@ translates to::
if not isinstance($value, ClsName):
FAIL
if $kind & MATCH_POSITIONAL:
if $items is None:
$items = type($value).__deconstruct__($value)
# $VARS is a meta-variable.
if len($items) < len($VARS):
FAIL
$VARS = $items
elif $kind & MATCH_DEFAULT:
if $attrs is None:
$attrs = type($value).__match_args__
if $kind & MATCH_DEFAULT:
$attrs = ClsName.__match_args__
if len($attr) < len($VARS):
FAIL
raise TypeError(...)
try:
for i, $VAR in enumerate($VARS):
$VAR = getattr($value, $attrs[i])
@ -391,7 +350,7 @@ Example: [6]_
.. note::
``__match_args__`` is not checked when matching positional-only class patterns,
this allows classes to match only positional-only patterns by leaving ``__match_args__`` set to the default value of ``()``.
this allows classes to match only positional-only patterns by leaving ``__match_args__`` set to the default value of ``None``.
Class patterns with all keyword patterns::
@ -401,21 +360,7 @@ translates to::
if not isinstance($value, ClsName):
FAIL
if $kind & MATCH_POSITIONAL:
if $items is None:
$items = type($value).__deconstruct__($value)
if $attrs is None:
$attrs = type($value).__match_args__
$kwname_tuple = tuple(QUOTE($KEYWORD) for $KEYWORD in $KEYWORD_PATTERNS)
$indices = multi_index($attrs, $kwname_tuple, 0)
if $indices is None:
raise TypeError(...)
try:
for $KEYWORD, $index in zip($KEYWORD_PATTERNS, indices):
$KEYWORD_PATTERNS[$KEYWORD] = $items[$index]
except IndexError:
raise TypeError(...)
elif $kind & MATCH_DEFAULT:
if $kind & MATCH_DEFAULT:
try:
for $KEYWORD in $KEYWORD_PATTERNS:
$tmp = getattr($value, QUOTE($KEYWORD))
@ -425,15 +370,6 @@ translates to::
else:
FAIL
Where the helper function ``multi_index(t, values, min)`` returns a tuple of indices of ``values`` into ``t``,
or ``None`` if any value is not present in ``t`` or the index of the value is less than ``min``.
Examples::
multi_index(("a", "b", "c"), ("a", "c"), 0) == (0,2)
multi_index(("a", "b", "c"), ("a", "c"), 1) is None
multi_index(("a", "b", "c"), ("a", "d"), 0) is None
Example: [7]_
Class patterns with positional and keyword patterns::
@ -444,35 +380,17 @@ translates to::
if not isinstance($value, ClsName):
FAIL
if $kind & MATCH_POSITIONAL:
if $items is None:
$items = type($value).__deconstruct__($value)
if $attrs is None:
$attrs = type($value).__match_args__
if len($items) < len($VARS):
FAIL
$VARS = $items[:len($VARS)]
$kwname_tuple = tuple(QUOTE($KEYWORD) for $KEYWORD in $KEYWORD_PATTERNS)
$indices = multi_index($attrs, $kwname_tuple, len($VARS))
if $indices is None:
raise TypeError(...)
try:
for $KEYWORD, $index in zip($KEYWORD_PATTERNS, indices):
$KEYWORD_PATTERNS[$KEYWORD] = $items[$index]
except IndexError:
raise TypeError(...)
elif $kind & MATCH_DEFAULT:
if $attrs is None:
$attrs = type($value).__match_args__
if $kind & MATCH_DEFAULT:
$attrs = ClsName.__match_args__
if len($attr) < len($VARS):
raise TypeError(...)
$positional_names = $attrs[:len($VARS)]
$pos_attrs = $attrs[:len($VARS)]
try:
for i, $VAR in enumerate($VARS):
$VAR = getattr($value, $attrs[i])
for $KEYWORD in $KEYWORD_PATTERNS:
$name = QUOTE($KEYWORD)
if $name in $positional_names:
if $name in pos_attrs:
raise TypeError(...)
$KEYWORD_PATTERNS[$KEYWORD] = getattr($value, $name)
except AttributeError:
@ -497,11 +415,9 @@ translates to::
if $kind & MATCH_SEQUENCE == 0:
FAIL
if $list is None:
$list = list($value)
if len($list) != 2:
if len($value) != 2:
FAIL
$value_0, $value_1 = $list
$value_0, $value_1 = $value
#Now match on temporary values
if not isinstance($value_0, int):
FAIL
@ -526,15 +442,13 @@ Non-conforming ``__match_kind__``
'''''''''''''''''''''''''''''''''
All classes should ensure that the the value of ``__match_kind__`` follows the specification.
Therefore, implementations can assume, without checking, that all the following are true::
Therefore, implementations can assume, without checking, that the following are true::
(__match_kind__ & (MATCH_SEQUENCE | MATCH_MAPPING)) != (MATCH_SEQUENCE | MATCH_MAPPING)
(__match_kind__ & (MATCH_SELF | MATCH_POSITIONAL)) != (MATCH_SELF | MATCH_POSITIONAL)
(__match_kind__ & (MATCH_SELF | MATCH_DEFAULT)) != (MATCH_SELF | MATCH_DEFAULT)
(__match_kind__ & (MATCH_DEFAULT | MATCH_POSITIONAL)) != (MATCH_DEFAULT | MATCH_POSITIONAL)
Thus, implementations can assume that ``__match_kind__ & MATCH_SEQUENCE`` implies ``(__match_kind__ & MATCH_MAPPING) == 0``, and vice-versa.
Likewise for ``MATCH_SELF``, ``MATCH_POSITIONAL`` and ``MATCH_DEFAULT``.
Likewise for ``MATCH_SELF`` and ``MATCH_DEFAULT``.
If ``__match_kind__`` does not follow the specification,
then implementations may treat any of the expressions of the form ``$kind & MATCH_...`` above as having any value.
@ -558,7 +472,7 @@ For common builtin classes ``__match_kind__`` will be:
* ``tuple``: ``MATCH_SEQUENCE | MATCH_SELF``
* ``dict``: ``MATCH_MAPPING | MATCH_SELF``
Named tuples will have ``__match_kind__`` set to ``MATCH_SEQUENCE | MATCH_POSITIONAL``.
Named tuples will have ``__match_kind__`` set to ``MATCH_SEQUENCE | MATCH_DEFAULT``.
* All other standard library classes for which ``issubclass(cls, collections.abc.Mapping)`` is true will have ``__match_kind__`` set to ``MATCH_MAPPING``.
* All other standard library classes for which ``issubclass(cls, collections.abc.Sequence)`` is true will have ``__match_kind__`` set to ``MATCH_SEQUENCE``.
@ -574,13 +488,26 @@ on the naive implementation.
When performing matching, implementations are allowed
to treat the following functions and methods as pure:
* ``cls.__len__()`` for any class supporting ``MATCH_SEQUENCE``
* ``dict.keys()``
* ``dict.__contains__()``
* ``dict.__getitem__()``
For any class supporting ``MATCH_SEQUENCE`` or ``MATCH_MAPPING``::
* ``cls.__len__()``
* ``cls.__getitem__()``
For any class supporting ``MATCH_MAPPING``::
* ``cls.keys()``
* ``cls.__contains__()``
Implementations are allowed to make the following assumptions:
* ``isinstance(obj, cls)`` can be freely replaced with ``issubclass(type(obj), cls)`` and vice-versa.
* ``isinstance(obj, cls)`` will always return the same result for any ``(obj, cls)`` pair and repeated calls can thus be elided.
* Reading ``__match_args__`` and calling ``__deconstruct__`` are pure operations, and may be cached.
* Sequences, that is any class for which ``MATCH_SEQUENCE`` is true, are not modified by iteration, subscripting or calls to ``len()``,
and thus those operations can be freely substituted for each other where they would be equivalent when applied to an immuable sequence.
In fact, implementations are encouraged to make these assumptions, as it is likely to result in signficantly better performance.
Implementations are allowed to freely replace ``isinstance(obj, cls)`` with ``issubclass(type(obj), cls)`` and vice-versa.
Implementations are also allowed to elide repeated tests of ``isinstance(obj, cls)``.
Security Implications
=====================
@ -685,20 +612,20 @@ The mapping lane can be implemented, roughly as:
::
# Choose lane
if len($dict) == 2:
if "a" in $dict:
if "b" in $dict:
x = $dict["a"]
y = $dict["b"]
if len($value) == 2:
if "a" in $value:
if "b" in $value:
x = $value["a"]
y = $value["b"]
goto W
if "c" in $dict:
x = $dict["a"]
y = $dict["c"]
if "c" in $value:
x = $value["a"]
y = $value["c"]
goto X
elif len(dict) == 3:
if "a" in $dict and "b" in $dict:
x = $dict["a"]
y = $dict["c"]
elif len($value) == 3:
if "a" in $value and "b" in $value:
x = $value["a"]
y = $value["c"]
goto Y
other = $value
goto Z
@ -711,17 +638,17 @@ The changes to the semantics can be summarized as:
* Selecting the kind of pattern uses ``cls.__match_kind__`` instead of
``issubclass(cls, collections.abc.Mapping)`` and ``issubclass(cls, collections.abc.Sequence)``
and allows classes control over which kinds of pattern they match.
* Class matching is controlled by the ``__match_kind__`` attribute,
and the ``__deconstruct__`` method allows classes more control over how they are deconstructed.
* The default behavior when matching a class pattern with keyword patterns is more precisely defined,
but is broadly unchanged.
and allows classes a bit more control over which kinds of pattern they match.
* The behavior when matching patterns is more precisely defined, but is otherwise unchanged.
There are no changes to syntax. All examples given in the PEP 636 tutorial should continue to work as they do now.
Rejected Ideas
==============
Using attributes from the instance's dictionary
-----------------------------------------------
An earlier version of this PEP only used attributes from the instance's dictionary when matching a class pattern with ``__match_kind__ == MATCH_DEFAULT``.
The intent was to avoid capturing bound-methods and other synthetic attributes. However, this also mean that properties were ignored.
@ -738,12 +665,36 @@ For the class::
Ideally we would match the attributes "a" and "p", but not "m".
However, there is no general way to do that, so this PEP now follows the semantics of PEP 634 for ``MATCH_DEFAULT``.
Classes may override this behavior if needed by using ``__match_kind__ == MATCH_POSITIONAL`` or ``__match_args__``.
Open Issues
===========
Lookup of ``__match_args__`` on the subject not the pattern
-----------------------------------------------------------
None, as yet.
An earlier version of this PEP looked up ``__match_args__`` on the class of the subject and
not the class specified in the pattern.
This has been rejected for a few reasons::
* Using the class specified in the pattern is more amenable to optimization and can offer better performance.
* Using the class specified in the pattern has the potential to provide better error reporting is some cases.
* Neither approach is perfect, both have odd corner cases. Keeping the status quo minimizes disruption.
Deferred Ideas
==============
The original version of this PEP included the match kind ``MATCH_POSITIONAL`` and special method
``__deconstruct__`` which would allow classes full control over their matching. This is important
for libraries like ``sympy``.
For example, using ``sympy``, we might want to write::
# sin(x)**2 + cos(x)**2 == 1
case Add(Pow(sin(a), 2), Pow(cos(b), 2)) if a == b:
return 1
For ``sympy`` to support the positional patterns with current pattern matching is possible,
but is tricky. With these additional features it can be implemented easily [9]_.
This idea will feature in a future PEP for 3.11.
However, it is too late in the 3.10 development cycle for such a change.
References
@ -759,10 +710,8 @@ Code examples
::
class Basic:
__match_kind__ = MATCH_POSITIONAL
def __deconstruct__(self):
return self._args
class Symbol:
__match_kind__ = MATCH_SELF
.. [2]
@ -774,11 +723,9 @@ translates to::
if $kind & MATCH_SEQUENCE == 0:
FAIL
if $list is None:
$list = list($value)
if len($list) != 2:
if len($value) != 2:
FAIL
a, b = $list
a, b = $value
if not a is b:
FAIL
@ -792,11 +739,9 @@ translates to::
if $kind & MATCH_SEQUENCE == 0:
FAIL
if $list is None:
$list = list($value)
if len($list) < 2:
if len($value) < 2:
FAIL
a, *b, c = $list
a, *b, c = $value
.. [4]
@ -808,12 +753,10 @@ translates to::
if $kind & MATCH_MAPPING == 0:
FAIL
if $dict is None:
$dict = dict($value)
if $dict.keys() != {"x", "y"}:
if $value.keys() != {"x", "y"}:
FAIL
x = $dict["x"]
y = $dict["y"]
x = $value["x"]
y = $value["y"]
if not x > 2:
FAIL
@ -821,17 +764,15 @@ translates to::
This::
case {"x": x, "y": y, **: z}:
case {"x": x, "y": y, **z}:
translates to::
if $kind & MATCH_MAPPING == 0:
FAIL
if $dict is None:
$dict = dict($value)
if not $dict.keys() >= {"x", "y"}:
if not $value.keys() >= {"x", "y"}:
FAIL
$tmp = dict($dict)
$tmp = dict($value)
x = $tmp.pop("x")
y = $tmp.pop("y")
z = $tmp
@ -846,15 +787,8 @@ translates to::
if not isinstance($value, ClsName):
FAIL
if $kind & MATCH_POSITIONAL:
if $items is None:
$items = type($value).__deconstruct__($value)
if len($items) < 2:
FAIL
x, y = $items
elif $kind & MATCH_DEFAULT:
if $attrs is None:
$attrs = type($value).__match_args__
if $kind & MATCH_DEFAULT:
$attrs = ClsName.__match_args__
if len($attr) < 2:
FAIL
try:
@ -875,20 +809,7 @@ translates to::
if not isinstance($value, ClsName):
FAIL
if $kind & MATCH_POSITIONAL:
if $items is None:
$items = type($value).__deconstruct__($value)
if $attrs is None:
$attrs = type($value).__match_args__
$indices = multi_index($attrs, ("a", "b"), 0)
if $indices is None:
raise TypeError(...)
try:
x = $items[$indices[0]]
y = $items[$indices[1]]
except IndexError:
raise TypeError(...)
elif $kind & MATCH_DEFAULT:
lif $kind & MATCH_DEFAULT:
try:
x = $value.a
y = $value.b
@ -908,25 +829,8 @@ translates to::
if not isinstance($value, ClsName):
FAIL
if $kind & MATCH_POSITIONAL:
if $items is None:
$items = type($value).__deconstruct__($value)
if $attrs is None:
$attrs = type($value).__match_args__
if len($items) < 1:
FAIL
x = $items[0]
$indices = multi_index($attrs, ("a",), 1)
if $indices is None:
raise TypeError(...)
$index = $indices[0]
try:
y = $items[$index]
except IndexError:
raise TypeError(...)
elif $kind & MATCH_DEFAULT:
if $attrs is None:
$attrs = type($value).__match_args__
if $kind & MATCH_DEFAULT:
$attrs = ClsName.__match_args__
if len($attr) < 1:
raise TypeError(...)
$positional_names = $attrs[:1]
@ -940,6 +844,16 @@ translates to::
else:
FAIL
.. [9]
::
class Basic:
__match_kind__ = MATCH_POSITIONAL
def __deconstruct__(self):
return self._args
Copyright
=========