python-peps/pep-0646.rst

690 lines
22 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

PEP: 646
Title: Variadic Generics
Author: Mark Mendoza <mendoza.mark.a@gmail.com>,
Matthew Rahtz <mrahtz@google.com>,
Pradeep Kumar Srinivasan <gohanpra@gmail.com>,
Vincent Siles <vsiles@fb.com>
Sponsor: Guido van Rossum <guido@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 16-Sep-2020
Python-Version: 3.10
Post-History: 07-Oct-2020, 23-Dec-2020, 29-Dec-2020
Abstract
========
PEP 484 introduced ``TypeVar``, enabling creation of generics parameterised
with a single type. In this PEP, we introduce ``TypeVarTuple``, enabling parameterisation
with an *arbitrary* number of types - that is, a *variadic* type variable,
enabling *variadic* generics. This allows the type of array-like structures
in numerical computing libraries such as NumPy and TensorFlow to be
parameterised with the array *shape*, enabling static type checkers
to catch shape-related bugs in code that uses these libraries.
Motivation
==========
In the context of numerical computation with libraries such as NumPy and
TensorFlow, the *shape* of arguments is often just as important as the
argument *type*. For example, consider the following function which converts a
batch [#batch]_ of videos to grayscale:
::
def to_gray(videos: Array): ...
From the signature alone, it is not obvious what shape of array [#array]_
we should pass for the ``videos`` argument. Possibilities include, for
example,
batch × time × height × width × channels
and
time × batch × channels × height × width. [#timebatch]_
Ideally, we should have some way of making the required shape clear in the
signature itself. Multiple proposals [#numeric-stack]_ [#typing-ideas]_
[#syntax-proposal]_ have suggested the use of the standard generics syntax for
this purpose. We would write:
::
def to_gray(videos: Array[Time, Batch, Height, Width, Channels]): ...
However, note that arrays can be of arbitrary rank - ``Array`` as used above is
generic in an arbitrary number of axes. One way around this would be to use a different
``Array`` class for each rank...
::
Axis1 = TypeVar('Axis1')
Axis2 = TypeVar('Axis2')
class Array1(Generic[Axis1]): ...
class Array2(Generic[Axis1, Axis2]): ...
...but this would be cumbersome, both for users (who would have to sprinkle 1s and 2s
and so on throughout their code) and for the authors of array libraries (who would have to duplicate implementations throughout multiple classes).
Variadic generics are necessary for a ``Array`` that is generic in an arbitrary
number of axes to be cleanly defined as a single class.
Specification
=============
In order to support the above use cases, we introduce ``TypeVarTuple``. This serves as a placeholder not for a single type but for an *arbitrary* number of types, and behaving like a number of ``TypeVar`` instances packed in a ``Tuple``.
These are described in detail below.
Type Variable Tuples
--------------------
In the same way that a normal type variable is a stand-in for a single type,
a type variable *tuple* is a stand-in for an arbitrary number of types (zero or
more) in a flat ordered list.
Type variable tuples are created with:
::
from typing import TypeVarTuple
Ts = TypeVarTuple('Ts')
Type variable tuples behave like a number of individual type variables packed in a
``Tuple``. To understand this, consider the following example:
::
Shape = TypeVarTuple('Shape')
class Array(Generic[*Shape]): ...
Height = NewType('Height', int)
Width = NewType('Width', int)
x: Array[Height, Width] = Array()
The ``Shape`` type variable tuple here behaves like ``Tuple[T1, T2]``,
where ``T1`` and ``T2`` are type variables. To use these type variables
as type parameters of ``Array``, we must **unpack** the type variable tuple using
the star operator: ``*Shape``. The signature of ``Array`` then behaves
as if we had simply written ``class Array(Generic[T1, T2]): ...``.
In contrast to ``Generic[T1, T2]``, however, ``Generic[*Shape]`` allows
us to parameterise the class with an *arbitrary* number of type parameters.
That is, in addition to being able to define rank-2 arrays such as
``Array[Height, Width]``, we could also define rank-3 arrays, rank-4 arrays,
and so on:
::
Time = NewType('Time', int)
Batch = NewType('Batch', int)
y: Array[Batch, Height, Width] = Array()
z: Array[Time, Batch, Height, Width] = Array()
Type variable tuples can be used anywhere a normal ``TypeVar`` can.
This includes class definitions, as shown above, as well as function
signatures and variable annotations:
::
class Array(Generic[*Shape]):
def __init__(self, shape: Tuple[*Shape]):
self._shape: Tuple[*Shape] = shape
def get_shape(self) -> Tuple[*Shape]:
return self._shape
def __abs__(self) -> Array[*Shape]: ...
def __add__(self, other: Array[*Shape]) -> Array[*Shape]) ...
shape = (Height(480), Width(640))
x: Array[Height, Width] = Array(shape)
y = abs(x) # Inferred type is Array[Height, Width]
z = x + x # ... is Array[Height, Width]
Type Variable Tuples Must Always be Unpacked
''''''''''''''''''''''''''''''''''''''''''''
Note that in the previous example, the ``shape`` argument to ``__init__``
was annotated as ``Tuple[*Shape]``. Why is this necessary - if ``Shape``
behaves like ``Tuple[T1, T2, ...]``, couldn't we have annotated the ``shape``
argument as ``Shape`` directly?
This is, in fact, deliberately not possible: type variable tuples must
*always* be used unpacked (that is, prefixed by the star operator). This is
for two reasons:
* To avoid potential confusion about whether to use a type variable tuple
in a packed or unpacked form ("Hmm, should I do ``-> Shape``,
or ``-> Tuple[Shape]``, or ``-> Tuple[*Shape]``...?")
* To improve readability: the star also functions as an explicit visual
indicator that the type variable tuple is not a normal type variable.
``Unpack`` for Backwards Compatibility
''''''''''''''''''''''''''''''''''''''
Note that the use of the star operator in this context requires a grammar change,
and is therefore available only in new versions of Python. To enable use of type
variable tuples in older versions of Python, we introduce the ``Unpack`` type
operator that can be used in place of the star operator to unpack type variable tuples:
::
# Unpacking using the star operator in new versions of Python
class Array(Generic[*Shape]): ...
# Unpacking using ``Unpack`` in older versions of Python
class Array(Generic[Unpack[Shape]]): ...
Variance, Type Constraints and Type Bounds: Not (Yet) Supported
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
To keep this PEP minimal, ``TypeVarTuple`` does not yet support specification of:
* Variance (e.g. ``TypeVar('T', covariant=True)``)
* Type constraints (``TypeVar('T', int, float)``)
* Type bounds (``TypeVar('T', bound=ParentClass)``)
We leave the decision of how these arguments should behave to a future PEP, when variadic generics have been tested in the field. As of this PEP, type variable tuples are
**invariant**.
Behaviour when Type Parameters are not Specified
''''''''''''''''''''''''''''''''''''''''''''''''
When a generic class parameterised by a type variable tuple is used without
any type parameters, it behaves as if its type parameters are ``Any, ...``
(an arbitrary number of ``Any``):
::
def takes_array_of_any_rank(arr: Array): ...
x: Array[Height, Width]
takes_array_of_any_rank(x) # Valid
y: Array[Time, Height, Width]
takes_array_of_any_rank(y) # Also valid
This enables gradual typing: existing functions with arguments annotated as being,
for example, a plain ``tf.Tensor``, will still be valid even if called with
a parameterised ``Tensor[Height, Width]``.
Type Variable Tuples Must Have Known Length
'''''''''''''''''''''''''''''''''''''''''''
Note that in the ``takes_array_of_any_rank`` example in the previous section,
``Array`` behaved as if it were ``Tuple[int, ...]``. This situation - when
type parameters are not specified - is the *only* case when a type variable
tuple may be bound to an unknown-length type. That is:
::
def foo(x: Tuple[*Ts]): ...
x: Tuple[float, ...]
foo(x) # NOT valid; Ts would be bound to ``Tuple[float, ...]``
(If this is confusing - didn't we say that type variable tuples are a stand-in
for an *arbitrary* number of types? - note the difference between the
length of the type variable tuple *itself*, and the length of the type it is
*bound* to. Type variable tuples themselves can be of arbitrary length -
that is, they can be bound to ``Tuple[int]``, ``Tuple[int, int]``, and
so on - but the length of the types they are bound to must be of known length -
that is, ``Tuple[int, int]``, but not ``Tuple[int, ...]``.)
Type Variable Tuple Equality
''''''''''''''''''''''''''''
If the same ``TypeVarTuple`` instance is used in multiple places in a signature
or class, a valid type inference might be to bind the ``TypeVarTuple`` to
a ``Tuple`` of a ``Union`` of types:
::
def foo(arg1: Tuple[*Ts], arg2: Tuple[*Ts])
a = (0,)
b = ('0',)
foo(a, b) # Can Ts be bound to Tuple[int | str]?
We do *not* allow this; type unions may *not* appear within the ``Tuple``.
If a type variable tuple appears in multiple places in a signature,
the types must match exactly:
::
def pointwise_multiply(
x: Array[*Shape],
y: Array[*Shape]
) -> Array[*Shape]: ...
x: Array[Height]
y: Array[Width]
z: Array[Height, Width]
pointwise_multiply(x, x) # Valid
pointwise_multiply(x, y) # Error
pointwise_multiply(x, z) # Error
Multiple Type Variable Tuples: Not Allowed
''''''''''''''''''''''''''''''''''''''''''
As of this PEP, only a single type variable tuple may appear in a type parameter list:
::
class Array(Generic[*Ts1, *Ts2]): ... # Error
(``Union`` is the one exception to this rule; see `Type Variable Tuples with Union`_.)
Type Prefixing
--------------
Type variable tuples don't have to be alone; normal types can be
prefixed to them:
::
Shape = TypeVarTuple('Shape')
Batch = NewType('Batch', int)
def add_batch_axis(x: Array[*Shape]) -> Array[Batch, *Shape]: ...
def del_batch_axis(x: Array[Batch, *Shape]) -> Array[*Shape]: ...
x: Array[Height, Width]
y = add_batch(x) # Inferred type is Array[Batch, Height, Width]
z = del_batch(y) # Array[Height, Width]
Normal ``TypeVar`` instances can also be prefixed:
::
T = TypeVar('T')
Ts = TypeVarTuple('Ts')
def prefix_tuple(
x: T,
y: Tuple[*Ts]
) -> Tuple[T, *Ts]: ...
z = prefix_tuple(x=0, y=(True, 'a'))
# Inferred type of z is Tuple[int, bool, str]
As of this PEP - that is, we may expand the flexibility of concatenation in future PEPs - prefixing is the only form of concatenation supported. (That is, the type variable tuple must appear last in the type parameter list.)
``*args`` as a Type Variable Tuple
----------------------------------
PEP 484 states that when a type annotation is provided for ``*args``, each argument
must be of the type annotated. That is, if we specify ``*args`` to be type ``int``,
then *all* arguments must be of type ``int``. This limits our ability to specify
the type signatures of functions that take heterogeneous argument types.
If ``*args`` is annotated as a type variable tuple, however, the types of the
individual arguments become the types in the type variable tuple:
::
Ts = TypeVarTuple('Ts')
def args_to_tuple(*args: *Ts) -> Tuple[*Ts]: ...
args_to_tuple(1, 'a') # Inferred type is Tuple[int, str]
If no arguments are passed, the type variable tuple behaves like an
empty tuple, ``Tuple[()]``.
Note that, in keeping with the rule that type variable tuples must always
be used unpacked, annotating ``*args`` as being a plain type variable tuple
instance is *not* allowed:
::
def foo(*args: Ts): ... # NOT valid
Also note that if the type variable tuple is wrapped in a ``Tuple``,
the old behaviour still applies: all arguments must be a ``Tuple``
parameterised with the same types.
::
def foo(*args: Tuple[*Ts]): ...
foo((0,), (1,)) # Valid
foo((0,), (1, 2)) # Error
foo((0,), ('1',)) # Error
Following `Type Variable Tuples Must Have Known Length`_, note
that the following should *not* type-check as valid (even though it is, of
course, valid at runtime):
::
def foo(*args: Tuple[*Ts]): ...
def bar(x: Tuple[int, ...]):
foo(*x) # NOT valid
Finally, note that a type variable tuple may *not* be used as the type of
``**kwargs``. (We do not yet know of a use case for this feature, so we prefer
to leave the ground fresh for a potential future PEP.)
::
# NOT valid
def foo(**kwargs: *Ts): ...
Type Variable Tuples with ``Callable``
--------------------------------------
Type variable tuples can also be used in the arguments section of a
``Callable``:
::
class Process:
def __init__(
target: Callable[[*Ts], Any],
args: Tuple[*Ts]
): ...
def func(arg1: int, arg2: str): ...
Process(target=func, args=(0, 'foo')) # Valid
Process(target=func, args=('foo', 0)) # Error
However, note that as of this PEP, if a type variable tuple does appear in
the arguments section of a ``Callable``, it must appear alone.
That is, `Type Prefixing`_ is not supported in the context of ``Callable``.
(Use cases where this might otherwise be desirable are likely covered through use
of either a ``ParamSpec`` from PEP 612, or a type variable tuple in the ``__call__``
signature of a callback protocol from PEP 544.)
Type Variable Tuples with ``Union``
-----------------------------------
Type variable tuples can also be used with ``Union``:
::
def f(*args: Tuple[*Ts]) -> Union[*Ts]:
return random.choice(args)
f(1, 'foo') # Inferred type is Union[int, str]
More than one type variable tuple may appear in the the parameter list
to ``Union``:
::
def cond_random_choice(
cond: bool,
cond_true: Tuple[*Ts1],
cond_false: Tuple[*Ts2]
) -> Union[*Ts1, *Ts2]:
if cond:
return random.choice(cond_true)
else:
return random.choice(cond_false)
# Inferred type is Union[int, str, float]
cond_random_choice(True, (1, 'foo'), (0.0, 'bar'))
If the type variable tuple is empty (e.g. if we had ``*args: Tuple[*Ts]``
and didn't pass any arguments), the type checker should
raise an error on the ``Union`` (matching the behaviour of ``Union``
at runtime, which requires at least one type argument).
Aliases
-------
Generic aliases can be created using a type variable tuple in
a similar way to regular type variables:
::
IntTuple = Tuple[int, *Ts]
IntTuple[float, bool] # Equivalent to Tuple[int, float, bool]
As this example shows, all type parameters passed to the alias are
bound to the type variable tuple. If no type parameters are given,
or if an explicitly empty list of type parameters are given,
type variable tuple in the alias is simply ignored:
::
# Both equivalent to Tuple[int]
IntTuple
IntTuple[()]
Normal ``TypeVar`` instances can also be used in such aliases:
::
T = TypeVar('T')
Foo = Tuple[T, *Ts]
# T is bound to `int`; Ts is bound to `bool, str`
Foo[int, bool, str]
Note that the same rules for `Type Prefixing`_ apply for aliases.
In particular, only one ``TypeVarTuple`` may occur within an alias,
and the ``TypeVarTuple`` must be at the end of the alias.
Overloads for Accessing Individual Types
----------------------------------------
For situations where we require access to each individual type, overloads can be used with individual ``TypeVar`` instances in place of the type variable tuple:
::
Shape = TypeVarTuple('Shape')
Axis1 = TypeVar('Axis1')
Axis2 = TypeVar('Axis2')
Axis3 = TypeVar('Axis3')
class Array(Generic[Shape]): ...
@overload
def transpose(
self: Array[Axis1, Axis2]
) -> Array[Axis2, Axis1]: ...
@overload
def transpose(
self: Array[Axis1, Axis2, Axis3)
) -> Array[Axis3, Axis2, Axis1]: ...
(For array shape operations in particular, having to specify
overloads for each possible rank is, of course, a rather cumbersome
solution. However, it's the best we can do without additional type
manipulation mechanisms, which are beyond the scope of this PEP.)
An Ideal Array Type: One Possible Example
=========================================
Type variable tuples allow us to make significant progress on the
typing of arrays. However, the array class we have sketched
out in this PEP is still missing some desirable features. [#typing-ideas]_
The most crucial feature missing is the ability to specify
the data type (e.g. ``np.float32`` or ``np.uint8``). This is important
because some numerical computing libraries will silently cast
types, which can easily lead to hard-to-diagnose bugs.
Additionally, it might be useful to be able to specify the rank
instead of the full shape. This could be useful for cases where
axes don't have obvious semantic meaning like 'height' or 'width',
or where the array is very high-dimensional and writing out all
the axes would be too verbose.
Here is one possible example of how these features might be implemented
in a complete array type.
::
# E.g. Ndim[Literal[3]]
Integer = TypeVar('Integer')
class Ndim(Generic[Integer]): ...
# E.g. Shape[Height, Width]
# (Where Height and Width are custom types)
Axes = TypeVarTuple('Axes')
class Shape(Generic[*Axes]): ...
DataType = TypeVar('DataType')
ShapeType = TypeVar('ShapeType', NDim, Shape)
# The most verbose type
# E.g. Array[np.float32, Ndim[Literal[3]]
# Array[np.uint8, Shape[Height, Width, Channels]]
class Array(Generic[DataType, ShapeType]): ...
# Type aliases for less verbosity
# E.g. Float32Array[Height, Width, Channels]
Float32Array = Array[np.float32, Shape[*Axes]]
# E.g. Array1D[np.uint8]
Array1D = Array[DataType, Ndim[Literal[1]]]
Final Notes
===========
**Slice expressions**: type variable tuples may *not* appear in slice expressions.
Rationale and Rejected Ideas
============================
Supporting Variadicity Through aliases
--------------------------------------
As noted in the introduction, it **is** possible to avoid variadic generics
by simply defining aliases for each possible number of type parameters:
::
class Array1(Generic[Axis1]): ...
class Array2(Generic[Axis1, Axis2]): ...
However, this seems somewhat clumsy - it requires users to unnecessarily
pepper their code with 1s, 2s, and so on for each rank necessary.
Construction of ``TypeVarTuple``
--------------------------------
``TypeVarTuple`` began as ``ListVariadic``, based on its naming in
an early implementation in Pyre.
We then changed this to ``TypeVar(list=True)``, on the basis that a)
it better emphasises the similarity to ``TypeVar``, and b) the meaning
of 'list' is more easily understood than the jargon of 'variadic'.
Once we'd decided that a variadic type variable should behave like a ``Tuple``,
we also considered ``TypeVar(bound=Tuple)``, which is similarly intuitive
and accomplishes most what we wanted without requiring any new arguments to
``TypeVar``. However, we realised this may constrain us in the future, if
for example we want type bounds or variance to function slightly differently
for variadic type variables than what the semantics of ``TypeVar`` might
otherwise imply. Also, we may later wish to support arguments that should not be supported by regular type variables (such as ``arbitrary_len`` [#arbitrary_len]_).
We therefore settled on ``TypeVarTuple``.
Backwards Compatibility
=======================
TODO
* ``Tuple`` needs to be upgraded to support parameterization with a
type variable tuple.
Reference Implementation
========================
TODO
Footnotes
==========
.. [#batch] 'Batch' is machine learning parlance for 'a number of'.
.. [#array] We use the term 'array' to refer to a matrix with an arbitrary
number of dimensions. In NumPy, the corresponding class is the ``ndarray``;
in TensorFlow, the ``Tensor``; and so on.
.. [#timebatch] If the shape begins with 'batch × time', then
``videos_batch[0][1]`` would select the second frame of the first video. If the
shape begins with 'time × batch', then ``videos_batch[1][0]`` would select the
same frame.
References
==========
.. [#pep-612] PEP 612, "Parameter Specification Variables":
https://www.python.org/dev/peps/pep-0612
.. [#numeric-stack] Static typing of Python numeric stack:
https://paper.dropbox.com/doc/Static-typing-of-Python-numeric-stack-summary-6ZQzTkgN6e0oXko8fEWwN
.. [#typing-ideas] Ideas for array shape typing in Python: https://docs.google.com/document/d/1vpMse4c6DrWH5rq2tQSx3qwP_m_0lyn-Ij4WHqQqRHY/edit
.. [#syntax-proposal] Shape annotation syntax proposal:
https://docs.google.com/document/d/1But-hjet8-djv519HEKvBN6Ik2lW3yu0ojZo6pG9osY/edit
.. [#arbitrary_len] Discussion on Python typing-sig mailing list: https://mail.python.org/archives/list/typing-sig@python.org/thread/SQVTQYWIOI4TIO7NNBTFFWFMSMS2TA4J/
Acknowledgements
================
Thank you to **Alfonso Castaño**, **Antoine Pitrou**, **Bas v.B.**, **David Foster**, **Dimitris Vardoulakis**, **Eric Traut**, **Guido van Rossum**, **Jia Chen**,
**Lucio Fernandez-Arjona**, **Nikita Sobolev**, **Peilonrayz**, **Rebecca Chen**,
**Sergei Lebedev** and **Vladimir Mikulik** for helpful feedback and suggestions on
drafts of this PEP.
Thank you especially to **Lucio**, for suggesting the star syntax, which has made multiple aspects of this proposal much more concise and intuitive.
Resources
=========
Discussions on variadic generics in Python started in 2016 with `Issue 193`__
on the python/typing GitHub repository.
__ https://github.com/python/typing/issues/193
Inspired by this discussion, **Ivan Levkivskyi** made a concrete proposal
at PyCon 2019, summarised in `Type system improvements`__
and `Static typing of Python numeric stack`__.
__ https://paper.dropbox.com/doc/Type-system-improvements-HHOkniMG9WcCgS0LzXZAe
__ https://paper.dropbox.com/doc/Static-typing-of-Python-numeric-stack-summary-6ZQzTkgN6e0oXko8fEWwN
Expanding on these ideas, **Mark Mendoza** and **Vincent Siles** gave a presentation on
`Variadic Type Variables for Decorators and Tensors`__ at the 2019 Python
Typing Summit.
__ https://github.com/facebook/pyre-check/blob/ae85c0c6e99e3bbfc92ec55104bfdc5b9b3097b2/docs/Variadic_Type_Variables_for_Decorators_and_Tensors.pdf
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: