PEP: 646 Title: Variadic Generics Author: Mark Mendoza , Matthew Rahtz , Pradeep Kumar Srinivasan , Vincent Siles Sponsor: Guido van Rossum Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 16-Sep-2020 Python-Version: 3.10 Post-History: 07-Oct-2020, 23-Dec-2020, 29-Dec-2020 Abstract ======== PEP 484 introduced ``TypeVar``, enabling creation of generics parameterised with a single type. In this PEP, we introduce ``TypeVarTuple``, enabling parameterisation with an *arbitrary* number of types - that is, a *variadic* type variable, enabling *variadic* generics. This enables a wide variety of use cases. In particular, it allows the type of array-like structures in numerical computing libraries such as NumPy and TensorFlow to be parameterised with the array *shape*, enabling static type checkers to catch shape-related bugs in code that uses these libraries. Motivation ========== Variadic generics have long been a requested feature, for a myriad of use cases [#typing193]_. One particular use case - a use case with potentially large impact, and the main case this PEP targets - concerns typing in numerical libraries. In the context of numerical computation with libraries such as NumPy and TensorFlow, the *shape* of arguments is often just as important as the argument *type*. For example, consider the following function which converts a batch [#batch]_ of videos to grayscale: :: def to_gray(videos: Array): ... From the signature alone, it is not obvious what shape of array [#array]_ we should pass for the ``videos`` argument. Possibilities include, for example, batch × time × height × width × channels and time × batch × channels × height × width. [#timebatch]_ Ideally, we should have some way of making the required shape clear in the signature itself. Multiple proposals [#numeric-stack]_ [#typing-ideas]_ [#syntax-proposal]_ have suggested the use of the standard generics syntax for this purpose. We would write: :: def to_gray(videos: Array[Time, Batch, Height, Width, Channels]): ... However, note that arrays can be of arbitrary rank - ``Array`` as used above is generic in an arbitrary number of axes. One way around this would be to use a different ``Array`` class for each rank... :: Axis1 = TypeVar('Axis1') Axis2 = TypeVar('Axis2') class Array1(Generic[Axis1]): ... class Array2(Generic[Axis1, Axis2]): ... ...but this would be cumbersome, both for users (who would have to sprinkle 1s and 2s and so on throughout their code) and for the authors of array libraries (who would have to duplicate implementations throughout multiple classes). Variadic generics are necessary for an ``Array`` that is generic in an arbitrary number of axes to be cleanly defined as a single class. Summary Examples ================ Cutting right to the chase, this PEP allows an ``Array`` class that is generic in its shape (and datatype) to be defined using a newly-introduced arbitrary-length type variable, ``TypeVarTuple``, as follows: :: from typing import TypeVar, TypeVarTuple DType = TypeVar('DType') Shape = TypeVarTuple('Shape') class Array(Generic[DType, *Shape]): def __abs__(self) -> Array[DType, *Shape]: ... def __add__(self, other: Array[DType, *Shape]) -> Array[DType, *Shape]: ... Such an ``Array`` can be used to support a number of different kinds of shape annotations. For example, we can add labels describing the semantic meaning of each axis: :: from typing import NewType Height = NewType('Height', int) Width = NewType('Width', int) x: Array[float, Height, Width] = Array() We could also add annotations describing the actual size of each axis: :: from typing import Literal L640 = Literal[640] L480 = Literal[480] x: Array[int, L480, L640] = Array() For consistency, we use semantic axis annotations as the basis of the examples in this PEP, but this PEP is agnostic about which of these two (or possibly other) ways of using ``Array`` is preferable; that decision is left to library authors. Specification ============= In order to support the above use cases, we introduce ``TypeVarTuple``. This serves as a placeholder not for a single type but for an *arbitrary* number of types, and behaving like a number of ``TypeVar`` instances packed in a ``Tuple``. Type Variable Tuples -------------------- In the same way that a normal type variable is a stand-in for a single type, a type variable *tuple* is a stand-in for an arbitrary number of types (zero or more) in a flat ordered list. Type variable tuples are created with: :: from typing import TypeVarTuple Ts = TypeVarTuple('Ts') Type variable tuples behave like a number of individual type variables packed in a ``Tuple``. To understand this, consider the following example: :: Shape = TypeVarTuple('Shape') class Array(Generic[*Shape]): ... Height = NewType('Height', int) Width = NewType('Width', int) x: Array[Height, Width] = Array() The ``Shape`` type variable tuple here behaves like ``Tuple[T1, T2]``, where ``T1`` and ``T2`` are type variables. To use these type variables as type parameters of ``Array``, we must *unpack* the type variable tuple using the star operator: ``*Shape``. The signature of ``Array`` then behaves as if we had simply written ``class Array(Generic[T1, T2]): ...``. In contrast to ``Generic[T1, T2]``, however, ``Generic[*Shape]`` allows us to parameterise the class with an *arbitrary* number of type parameters. That is, in addition to being able to define rank-2 arrays such as ``Array[Height, Width]``, we could also define rank-3 arrays, rank-4 arrays, and so on: :: Time = NewType('Time', int) Batch = NewType('Batch', int) y: Array[Batch, Height, Width] = Array() z: Array[Time, Batch, Height, Width] = Array() Type variable tuples can be used anywhere a normal ``TypeVar`` can. This includes class definitions, as shown above, as well as function signatures and variable annotations: :: class Array(Generic[*Shape]): def __init__(self, shape: Tuple[*Shape]): self._shape: Tuple[*Shape] = shape def get_shape(self) -> Tuple[*Shape]: return self._shape shape = (Height(480), Width(640)) x: Array[Height, Width] = Array(shape) y = abs(x) # Inferred type is Array[Height, Width] z = x + x # ... is Array[Height, Width] Type Variable Tuples Must Always be Unpacked '''''''''''''''''''''''''''''''''''''''''''' Note that in the previous example, the ``shape`` argument to ``__init__`` was annotated as ``Tuple[*Shape]``. Why is this necessary - if ``Shape`` behaves like ``Tuple[T1, T2, ...]``, couldn't we have annotated the ``shape`` argument as ``Shape`` directly? This is, in fact, deliberately not possible: type variable tuples must *always* be used unpacked (that is, prefixed by the star operator). This is for two reasons: * To avoid potential confusion about whether to use a type variable tuple in a packed or unpacked form ("Hmm, should I write '``-> Shape``', or '``-> Tuple[Shape]``', or '``-> Tuple[*Shape]``'...?") * To improve readability: the star also functions as an explicit visual indicator that the type variable tuple is not a normal type variable. ``Unpack`` for Backwards Compatibility '''''''''''''''''''''''''''''''''''''' Note that the use of the star operator in this context requires a grammar change, and is therefore available only in new versions of Python. To enable use of type variable tuples in older versions of Python, we introduce the ``Unpack`` type operator that can be used in place of the star operator: :: # Unpacking using the star operator in new versions of Python class Array(Generic[*Shape]): ... # Unpacking using ``Unpack`` in older versions of Python class Array(Generic[Unpack[Shape]]): ... Variance, Type Constraints and Type Bounds: Not (Yet) Supported ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' To keep this PEP minimal, ``TypeVarTuple`` does not yet support specification of: * Variance (e.g. ``TypeVar('T', covariant=True)``) * Type constraints (``TypeVar('T', int, float)``) * Type bounds (``TypeVar('T', bound=ParentClass)``) We leave the decision of how these arguments should behave to a future PEP, when variadic generics have been tested in the field. As of this PEP, type variable tuples are invariant. Behaviour when Type Parameters are not Specified '''''''''''''''''''''''''''''''''''''''''''''''' When a generic class parameterised by a type variable tuple is used without any type parameters, it behaves as if its type parameters are '``Any, ...``' (an arbitrary number of ``Any``): :: def takes_any_array(arr: Array): ... x: Array[Height, Width] takes_any_array(x) # Valid y: Array[Time, Height, Width] takes_any_array(y) # Also valid This enables gradual typing: existing functions accepting, for example, a plain TensorFlow ``Tensor`` will still be valid even if ``Tensor`` is made generic and calling code passes a ``Tensor[Height, Width]``. This also works in the opposite direction: :: def takes_specific_array(arr: Array[Height, Width]): ... z: Array takes_specific_array(z) This way, even if libraries are updated to use types like ``Array[Height, Width]``, users of those libraries won't be forced to also apply type annotations to all of their code; users still have a choice about what parts of their code to type and which parts to not. Type Variable Tuples Must Have Known Length ''''''''''''''''''''''''''''''''''''''''''' Type variables tuples may not be bound to a type with unknown length. That is: :: def foo(x: Tuple[*Ts]): ... x: Tuple[float, ...] foo(x) # NOT valid; Ts would be bound to ``Tuple[float, ...]`` If this is confusing - didn't we say that type variable tuples are a stand-in for an *arbitrary* number of types? - note the difference between the length of the type variable tuple *itself*, and the length of the type it is *bound* to. Type variable tuples themselves can be of arbitrary length - that is, they can be bound to ``Tuple[int]``, ``Tuple[int, int]``, and so on - but the types they are bound to must be of known length - that is, ``Tuple[int, int]``, but not ``Tuple[int, ...]``. Note that, as a result of this rule, omitting the type parameter list is the *only* way of instantiating a generic type with an arbitrary number of type parameters. (We plan to introduce a more deliberate syntax for this case in a future PEP.) For example, an unparameterised ``Array`` may *behave* like ``Array[Any, ...]``, but it cannot be instantiated using ``Array[Any, ...]``, because this would bind its type variable tuple to ``Tuple[Any, ...]``: :: x: Array # Valid y: Array[int, ...] # Error z: Array[Any, ...] # Error Type Variable Tuple Equality '''''''''''''''''''''''''''' If the same ``TypeVarTuple`` instance is used in multiple places in a signature or class, a valid type inference might be to bind the ``TypeVarTuple`` to a ``Tuple`` of a ``Union`` of types: :: def foo(arg1: Tuple[*Ts], arg2: Tuple[*Ts]): ... a = (0,) b = ('0',) foo(a, b) # Can Ts be bound to Tuple[int | str]? We do *not* allow this; type unions may *not* appear within the ``Tuple``. If a type variable tuple appears in multiple places in a signature, the types must match exactly (the list of type parameters must be the same length, and the type parameters themselves must be identical): :: def pointwise_multiply( x: Array[*Shape], y: Array[*Shape] ) -> Array[*Shape]: ... x: Array[Height] y: Array[Width] z: Array[Height, Width] pointwise_multiply(x, x) # Valid pointwise_multiply(x, y) # Error pointwise_multiply(x, z) # Error Multiple Type Variable Tuples: Not Allowed '''''''''''''''''''''''''''''''''''''''''' As of this PEP, only a single type variable tuple may appear in a type parameter list: :: class Array(Generic[*Ts1, *Ts2]): ... # Error Type Concatenation ------------------ Type variable tuples don't have to be alone; normal types can be prefixed and/or suffixed: :: Shape = TypeVarTuple('Shape') Batch = NewType('Batch', int) Channels = NewType('Channels', int) def add_batch_axis(x: Array[*Shape]) -> Array[Batch, *Shape]: ... def del_batch_axis(x: Array[Batch, *Shape]) -> Array[*Shape]: ... def add_batch_channels( x: Array[*Shape] ) -> Array[Batch, *Shape, Channels]: ... a: Array[Height, Width] b = add_batch_axis(a) # Inferred type is Array[Batch, Height, Width] c = del_batch_axis(b) # Array[Height, Width] d = add_batch_channels(a) # Array[Batch, Height, Width, Channels] Normal ``TypeVar`` instances can also be prefixed and/or suffixed: :: T = TypeVar('T') Ts = TypeVarTuple('Ts') def prefix_tuple( x: T, y: Tuple[*Ts] ) -> Tuple[T, *Ts]: ... z = prefix_tuple(x=0, y=(True, 'a')) # Inferred type of z is Tuple[int, bool, str] ``*args`` as a Type Variable Tuple ---------------------------------- PEP 484 states that when a type annotation is provided for ``*args``, every argument must be of the type annotated. That is, if we specify ``*args`` to be type ``int``, then *all* arguments must be of type ``int``. This limits our ability to specify the type signatures of functions that take heterogeneous argument types. If ``*args`` is annotated as a type variable tuple, however, the types of the individual arguments become the types in the type variable tuple: :: Ts = TypeVarTuple('Ts') def args_to_tuple(*args: *Ts) -> Tuple[*Ts]: ... args_to_tuple(1, 'a') # Inferred type is Tuple[int, str] If no arguments are passed, the type variable tuple behaves like an empty tuple, ``Tuple[()]``. Note that, in keeping with the rule that type variable tuples must always be used unpacked, annotating ``*args`` as being a plain type variable tuple instance is *not* allowed: :: def foo(*args: Ts): ... # NOT valid ``*args`` is the only case where an argument can be annotated as ``*Ts`` directly; other arguments should use ``*Ts`` to parameterise something else, e.g. ``Tuple[*Ts]``. If ``*args`` itself is annotated as ``Tuple[*Ts]``, the old behaviour still applies: all arguments must be a ``Tuple`` parameterised with the same types. :: def foo(*args: Tuple[*Ts]): ... foo((0,), (1,)) # Valid foo((0,), (1, 2)) # Error foo((0,), ('1',)) # Error Following `Type Variable Tuples Must Have Known Length`_, note that the following should *not* type-check as valid (even though it is, of course, valid at runtime): :: def foo(*args: *Ts): ... def bar(x: Tuple[int, ...]): foo(*x) # NOT valid Finally, note that a type variable tuple may *not* be used as the type of ``**kwargs``. (We do not yet know of a use case for this feature, so we prefer to leave the ground fresh for a potential future PEP.) :: # NOT valid def foo(**kwargs: *Ts): ... Type Variable Tuples with ``Callable`` -------------------------------------- Type variable tuples can also be used in the arguments section of a ``Callable``: :: class Process: def __init__( self, target: Callable[[*Ts], Any], args: Tuple[*Ts] ): ... def func(arg1: int, arg2: str): ... Process(target=func, args=(0, 'foo')) # Valid Process(target=func, args=('foo', 0)) # Error Other types and normal type variables can also be prefixed/suffixed to the type variable tuple: :: T = TypeVar('T') def foo(f: Callable[[int, *Ts, T], Tuple[T, *Ts]]): ... Aliases ------- Generic aliases can be created using a type variable tuple in a similar way to regular type variables: :: IntTuple = Tuple[int, *Ts] NamedTuple = Tuple[str, Tuple[*Ts]] IntTuple[float, bool] # Equivalent to Tuple[int, float, bool] NamedTuple[int, int] # Equivalent to Tuple[str, Tuple[int, int]] As this example shows, all type parameters passed to the alias are bound to the type variable tuple. Importantly for our original ``Array`` example (see `Summary Examples`_), this allows us to define convenience aliases for arrays of a fixed shape or datatype: :: Shape = TypeVarTuple('Shape') DType = TypeVar('DType') class Array(Generic[DType, *Shape]): # E.g. Float32Array[Height, Width, Channels] Float32Array = Array[np.float32, *Shape] # E.g. Array1D[np.uint8] Array1D = Array[DType, Any] If an explicitly empty type parameter list is given, the type variable tuple in the alias is set empty: :: IntTuple[()] # Equivalent to Tuple[int] NamedTuple[()] # Equivalent to Tuple[str, Tuple[()]] If the type parameter list is omitted entirely, the alias is compatible with arbitrary type parameters: :: def takes_float_array_of_any_shape(x: Float32Array): ... x: Float32Array[Height, Width] = Array() takes_float_array_of_any_shape(x) # Valid def takes_float_array_with_specific_shape( y: Float32Array[Height, Width] ): ... y: Float32Array = Array() takes_float_array_with_specific_shape(y) # Valid Normal ``TypeVar`` instances can also be used in such aliases: :: T = TypeVar('T') Foo = Tuple[*Ts, T] # Ts bound to Tuple[int], T to int Foo[str, int] # Ts bound to Tuple[()], T to int Foo[int] # T bound to Any, Ts to an arbitrary number of Any Foo Overloads for Accessing Individual Types ---------------------------------------- For situations where we require access to each individual type in the type variable tuple, overloads can be used with individual ``TypeVar`` instances in place of the type variable tuple: :: Shape = TypeVarTuple('Shape') Axis1 = TypeVar('Axis1') Axis2 = TypeVar('Axis2') Axis3 = TypeVar('Axis3') class Array(Generic[*Shape]): @overload def transpose( self: Array[Axis1, Axis2] ) -> Array[Axis2, Axis1]: ... @overload def transpose( self: Array[Axis1, Axis2, Axis3] ) -> Array[Axis3, Axis2, Axis1]: ... (For array shape operations in particular, having to specify overloads for each possible rank is, of course, a rather cumbersome solution. However, it's the best we can do without additional type manipulation mechanisms. We plan to introduce these in a future PEP.) Rationale and Rejected Ideas ============================ Supporting Variadicity Through Aliases -------------------------------------- As noted in the introduction, it *is* possible to avoid variadic generics by simply defining aliases for each possible number of type parameters: :: class Array1(Generic[Axis1]): ... class Array2(Generic[Axis1, Axis2]): ... However, this seems somewhat clumsy - it requires users to unnecessarily pepper their code with 1s, 2s, and so on for each rank necessary. Construction of ``TypeVarTuple`` -------------------------------- ``TypeVarTuple`` began as ``ListVariadic``, based on its naming in an early implementation in Pyre. We then changed this to ``TypeVar(list=True)``, on the basis that a) it better emphasises the similarity to ``TypeVar``, and b) the meaning of 'list' is more easily understood than the jargon of 'variadic'. Once we'd decided that a variadic type variable should behave like a ``Tuple``, we also considered ``TypeVar(bound=Tuple)``, which is similarly intuitive and accomplishes most what we wanted without requiring any new arguments to ``TypeVar``. However, we realised this may constrain us in the future, if for example we want type bounds or variance to function slightly differently for variadic type variables than what the semantics of ``TypeVar`` might otherwise imply. Also, we may later wish to support arguments that should not be supported by regular type variables (such as ``arbitrary_len`` [#arbitrary_len]_). We therefore settled on ``TypeVarTuple``. Behaviour when Type Parameters are not Specified ------------------------------------------------ In order to support gradual typing, this PEP states that *both* of the following examples should type-check correctly: :: def takes_any_array(x: Array): ... x: Array[Height, Width] takes_any_array(x) def takes_specific_array(y: Array[Height, Width]): ... y: Array takes_specific_array(y) Note that this is in contrast to the behaviour of the only currently-existing variadic type in Python, ``Tuple``: :: def takes_any_tuple(x: Tuple): ... x: Tuple[int, str] takes_any_tuple(x) # Valid def takes_specific_tuple(y: Tuple[int, str]): ... y: Tuple takes_specific_tuple(y) # Error The rules for ``Tuple`` were deliberately chosen such that the latter case is an error: it was thought to be more likely that the programmer has made a mistake than that the function expects a specific kind of ``Tuple`` but the specific kind of ``Tuple`` passed is unknown to the type checker. Additionally, ``Tuple`` is something of a special case, in that it is used to represent immutable sequences. That is, if an object's type is inferred to be an unparameterised ``Tuple``, it is not necessarily because of incomplete typing. In contrast, if an object's type is inferred to be an unparameterised ``Array``, it is much more likely that the user has simply not yet fully annotated their code, or that the signature of a shape-manipulating library function cannot yet be expressed using the typing system and therefore returning a plain ``Array`` is the only option. We rarely deal with arrays of truly arbitrary shape; in certain cases, *some* parts of the shape will be arbitrary - for example, when dealing with sequences, the first two parts of the shape are often 'batch' and 'time' - but we plan to support these cases explicitly in a future PEP with a syntax such as ``Array[Batch, Time, ...]``. We therefore made the decision to have variadic generics *other* than ``Tuple`` behave differently, in order to give the user more flexibility in how much of their code they wish to annotate, and to enable compatibility between old unannotated code and new versions of libraries which do use these type annotations. Backwards Compatibility ======================= The ``Unpack`` version of the PEP should be back-portable to previous versions of Python. Gradual typing is enabled by the fact that unparameterised variadic classes are compatible with an arbitrary number of type parameters. This means that if existing classes are made generic, a) all existing (unparameterised) uses of the class will still work, and b) parameterised and unparameterised versions of the class can be used together (relevant if, for example, library code is updated to use parameters while user code is not, or vice-versa). Reference Implementation ======================== Two reference implementations of type-checking functionality exist: one in Pyre, as of TODO, and one in Pyright, as of v1.1.108. A preliminary implementation of the ``Unpack`` version of the PEP in CPython is available in `cpython/23527`_. A preliminary version of the version using the star operator, based on an early implementation of PEP 637, is also available at `mrahtz/cpython/pep637+646`_. Footnotes ========== .. [#batch] 'Batch' is machine learning parlance for 'a number of'. .. [#array] We use the term 'array' to refer to a matrix with an arbitrary number of dimensions. In NumPy, the corresponding class is the ``ndarray``; in TensorFlow, the ``Tensor``; and so on. .. [#timebatch] If the shape begins with 'batch × time', then ``videos_batch[0][1]`` would select the second frame of the first video. If the shape begins with 'time × batch', then ``videos_batch[1][0]`` would select the same frame. Acknowledgements ================ Thank you to **Alfonso Castaño**, **Antoine Pitrou**, **Bas v.B.**, **David Foster**, **Dimitris Vardoulakis**, **Eric Traut**, **Guido van Rossum**, **Jia Chen**, **Lucio Fernandez-Arjona**, **Nikita Sobolev**, **Peilonrayz**, **Rebecca Chen**, **Sergei Lebedev** and **Vladimir Mikulik** for helpful feedback and suggestions on drafts of this PEP. Thank you especially to **Lucio**, for suggesting the star syntax, which has made multiple aspects of this proposal much more concise and intuitive. Resources ========= Discussions on variadic generics in Python started in 2016 with Issue 193 on the python/typing GitHub repository [#typing193]_. Inspired by this discussion, **Ivan Levkivskyi** made a concrete proposal at PyCon 2019, summarised in notes on 'Type system improvements' [#type-improvements]_ and 'Static typing of Python numeric stack' [#numeric-stack]_. Expanding on these ideas, **Mark Mendoza** and **Vincent Siles** gave a presentation on 'Variadic Type Variables for Decorators and Tensors' [#variadic-type-variables]_ at the 2019 Python Typing Summit. References ========== .. [#typing193] Python typing issue #193: https://github.com/python/typing/issues/193 .. [#type-improvements] Ivan Levkivskyi, 'Type system improvements', PyCon 2019: https://paper.dropbox.com/doc/Type-system-improvements-HHOkniMG9WcCgS0LzXZAe .. [#numeric-stack] Ivan Levkivskyi, 'Static typing of Python numeric stack', PyCon 2019: https://paper.dropbox.com/doc/Static-typing-of-Python-numeric-stack-summary-6ZQzTkgN6e0oXko8fEWwN .. [#typing-ideas] Stephan Hoyer, 'Ideas for array shape typing in Python': https://docs.google.com/document/d/1vpMse4c6DrWH5rq2tQSx3qwP_m_0lyn-Ij4WHqQqRHY/edit .. [#variadic-type-variables] Mark Mendoza, 'Variadic Type Variables for Decorators and Tensors', Python Typing Summit 2019: https://github.com/facebook/pyre-check/blob/ae85c0c6e99e3bbfc92ec55104bfdc5b9b3097b2/docs/Variadic_Type_Variables_for_Decorators_and_Tensors.pdf .. [#syntax-proposal] Matthew Rahtz et al., 'Shape annotation syntax proposal': https://docs.google.com/document/d/1But-hjet8-djv519HEKvBN6Ik2lW3yu0ojZo6pG9osY/edit .. [#arbitrary_len] Discussion on Python typing-sig mailing list: https://mail.python.org/archives/list/typing-sig@python.org/thread/SQVTQYWIOI4TIO7NNBTFFWFMSMS2TA4J/ .. _cpython/23527: https://github.com/python/cpython/pull/24527 .. _mrahtz/cpython/pep637+646: https://github.com/mrahtz/cpython/tree/pep637%2B646 Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: