PEP: 0646 Title: Variadic Generics Author: Mark Mendoza , Matthew Rahtz , Vincent Siles Sponsor: Guido van Rossum Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 16-Sep-2020 Python-Version: 3.10 Post-History: 07-Oct-2020 Abstract ======== PEP 484 introduced ``TypeVar``, enabling creation of generics parameterised with a single type. In this PEP, we introduce ``TypeTuple``, enabling parameterisation with an *arbitrary* number of types - that is, a *variadic* type variable, enabling *variadic* generics. This allows the type of array-like structures in numerical computing libraries such as NumPy and TensorFlow to be parameterised with the array *shape*, enabling static type checkers to catch shape-related bugs in code that uses these libraries. Motivation ========== There are two main use-cases for variadic generics. The primary motivation is to enable typing of array shapes in numerical computing libraries such as NumPy and TensorFlow. This is the use-case much of the PEP will focus on. Additionally, variadic generics allow us to concisely specify the type signature of ``map`` and ``zip``. We discuss each of these three motivations below. Array Shapes ------------- In the context of numerical computation with libraries such as NumPy and TensorFlow, the *shape* of arguments is often just as important as the argument *type*. For example, consider the following function which converts a batch [#batch]_ of videos to grayscale: :: def to_gray(videos: Tensor): ... From the signature alone, it is not obvious what shape of array [#array]_ we should pass for the ``videos`` argument. Possibilities include, for example, batch × time × height × width × channels and time × batch × channels × height × width. [#timebatch]_ Ideally, we should have some way of making the required shape clear in the signature itself. Multiple proposals [#numeric-stack]_ [#typing-ideas]_ [#syntax-proposal]_ have suggested the use of the standard generics syntax for this purpose. We would write: :: def to_gray(videos: Tensor[Time, Batch, Height, Width, Channels]): ... However, note that arrays can be of arbitrary rank - ``Tensor`` as used above is generic in an arbitrary number of axes. One way around this would be to use a different ``Tensor`` class for each rank... :: Axis1 = TypeVar('Axis1') Axis2 = TypeVar('Axis2') class Tensor1(Generic[Axis1]): ... class Tensor2(Generic[Axis1, Axis2]): ... ...but this would be cumbersome, both for users (who would have to sprinkle 1s and 2s and so on throughout their code) and for the authors of tensor libraries (who would have to duplicate implementations throughout multiple classes). Variadic generics are necessary for a ``Tensor`` that is generic in an arbitrary number of axes to be cleanly defined as a single class. ``map`` and ``zip`` ------------------- PEP 612 [#pep-612]_ introduced ``ParamSpec``, enabling parameter types of one callable to be forwarded to another callable. However, in many cases we actually wish to *transform* parameter types before using them elsewhere in the signature. Consider, for example, the signature of ``map`` for a particular choice of function and iterables: :: def func(int, str) -> float: ... iter1: List[int] iter2: List[str] def map(func: Callable[[int, str], float], iter1: Iterable[int], iter2: Iterable[str]) -> Iterable[float]: ... Note that the number of iterables passed to ``map`` is dependent on the supplied function, and that the types of those iterables must correspond to the types of supplied function's arguments. A similar example is ``zip``: :: iter1: List[int] iter2: List[str] def zip(iter1: Iterable[int], iter2: Iterable[str]) -> Iterable[Tuple[int, str]]: ... Neither of these signatures can be specified in the general case using existing typing mechanisms. The signature of ``zip``, for example, is currently specified [#zip-sig]_ with a number of overloads. Specification ============= In order to support the above use-cases, we introduce: * ``TypeTupleVar``, a ``TypeVar`` that acts as a placeholder not for a single type but for an *arbitrary* number of types. * A new syntax for parameterizing generic functions and classes using a type tuple variable. * Two new type operators, ``Apply`` and ``Map``. These are described in detail below. Type Tuple Variables -------------------- In the same way that a normal type variable is a stand-in for a single type, a type *tuple* variable is a stand-in for an arbitrary number of types in a flat ordered list. Type tuple variables are created with: :: from typing import TypeTupleVar Ts = TypeTupleVar('Ts') A type tuple variable behaves in a similar way to a parameterized ``Tuple``. For example, in a generic object instantiated with type parameters ``int`` and ``str``, ``Ts`` behaves similarly to ``Tuple[int, str]``. Parameterizing Types: Star Operator ''''''''''''''''''''''''''''''''''' One use of type tuple variables are to parameterize variadic types such as ``Tuple``. To differentiate type tuple variables from normal type variables, we introduce a new use for the star operator: :: Tuple[*Ts] The star operator here serves to 'expand' the type tuple into its component types. For example, in a generic object instantiated with ``Ts`` being ``int`` and ``str``, then ``Tuple[*Ts]`` would be equivalent to ``Tuple[int, str]``. For consistency, the star operator can also be applied directly to a parameterised ``Tuple``: :: Types = Tuple[int, str, bool, float, double] Tuple[*Types] # Also valid Parameterizing Types: ``Expand`` '''''''''''''''''''''''''''''''' Because the new use of the star operator requires a syntax change and is therefore incompatible with previous versions of Python, we also introduce the ``Expand`` type operator for use in existing versions of Python. ``Expand`` behaves identically to the star operator, but without requiring a syntax change. In any place you would normally write ``*Ts``, you can also write ``Expand[Ts]``. Parameterizing Function Signatures and Classes '''''''''''''''''''''''''''''''''''''''''''''' Type tuple variables can be used anywhere a normal ``TupleVar`` can. For example, in class definitions, function signatures, and variable annotations: :: Shape = TypeTupleVar('Shape') class Tensor(Generic[*Shape]): def __init__(self, shape: Tuple[int, ...]): self.shape: Shape = shape def __abs__(self) -> Tensor[*Shape]: ... def __add__(self, other: Tensor[*Shape]) -> Tensor[*Shape]: ... class Height: pass class Width: pass x: Tensor[Height, Width] = Tensor(shape=(640, 480)) x.shape # Inferred type is Tuple[Height, Width] y = abs(x) # Tensor[Height, Width] z = x + y # Tensor[Height, Width] Unexpanded Type Tuple Variables ''''''''''''''''''''''''''''''' Until now, we have always expanded type tuple variables. However, type tuple variables can also be used without being expanded. When used in this way, the type tuple variable behaves like a ``Tuple`` parameterised by the types that the type tuple variable is bound to. That is: :: def foo(x: Tuple[*Ts]) -> Tuple[*Ts]: ... # could also be written as def foo(x: Ts) -> Ts: ... Type tuple variables can also be used unexpanded in in the context of generic classes. However, note that when used in this way, type parameters to the generic class must be explicitly enclosed in a ``Tuple``. :: class Foo(Generic[Ts]): ... foo: Foo[Tuple[int, str]] See `Concatenating Multiple type tuple Variables`_ below for why this is important. ``*args`` as a Type Tuple Variable '''''''''''''''''''''''''''''''''' PEP 484 states that when a type annotation is provided for ``*args``, each argument must be of the type annotated. That is, if we specify ``*args`` to be type ``int``, then *all* arguments must be of type ``int``. This limits our ability to specify the type signatures of functions that take heterogeneous argument types. If ``*args`` is annotated as being an expanded type tuple variable, however, the types of the individual arguments become the types in the type tuple: :: def args_to_tuple(*args: *Ts) -> Tuple[*Ts]: ... args_to_tuple(1, 'a') # Inferred type is Tuple[int, str] Inside the body of ``args_to_tuple``, the type of ``args`` is ``Tuple[*Ts]`` (with ``*Ts`` substituted for the actual types at runtime). Note that, for consistency, the following is also valid syntactically: :: def foo(*args: *Tuple[int, str]): ... However, since it is a strange thing to do (why not just specify the arguments directly as ``arg1: int, arg2: str``?), we recommend type checkers emit a warning when coming across such annotations. Also note that when a type tuple variable is used in this way, it *must* be in conjunction with the star operator: :: def foo(*args: Ts): ... # NOT valid Finally, note that a type tuple variable may *not* be used as the type of ``**kwargs``. (We do not yet know of a use-case for this feature, so prefer to leave the ground fresh for a potential future PEP.) :: def foo(**kwargs: *Ts): ... # NOT valid ``Map`` ------- To enable typing of functions such as ``map`` and ``zip``, we introduce the ``Map`` type operator. Not to be confused with the existing operator ``typing.Mapping``, ``Map`` is analogous to ``map``, but for types: :: from typing import Map def args_to_tuples(*args: *Ts) -> Map[Tuple, Ts]: ... args_to_tuples(1, 'a') # Inferred type is Tuple[Tuple[int], Tuple[str]] ``Map`` takes two operands. The first operand is a parameterizable type (or type alias [#type_aliases]) such as ``Tuple`` or ``List``. The second operand is a type tuple variable or a parameterized ``Tuple`` such as ``Tuple[int, str]``. The result of ``Map`` is a ``Tuple``, where the Nth type in the ``Tuple`` is the first operand parameterized by the Nth type in the second operand. Because ``Map`` returns a parameterized ``Tuple``, it can be used anywhere that a type tuple variable would be. For example: :: # Equivalent to 'arg1: List[T1], arg2: List[T2], ...' def foo(*args: *Map[List, Ts]): ... # Equivalent to '-> Tuple[List[T1], List[T2], ...]' def bar(*args: *Ts) -> Map[List, Ts]: ... bar() # Inferred type is Tuple[()] (an empty tuple) bar(1) # Inferred type is Tuple[List[int]] bar(1, 'a') # Inferred type is Tuple[List[int], List[str]] ``map`` and ``zip`` ''''''''''''''''''' ``Map`` allows us to specify the signature of ``map`` as: :: ArgTs = TypeTupleVar('ArgTs') ReturnT = TypeVar('ReturnT') def map(func: Callable[[*ArgTs], ReturnT], *iterables: *Map[Iterable, ArgTs]) -> Iterable[ReturnT]: ... def func(int, str) -> float: ... # ArgTs is bound to Tuple[int, str] # Map[Iterable, ArgTs] is Iterable[int], Iterable[str] # Therefore, iter1 must be type Iterable[int], # and iter2 must be type Iterable[str] map(func, iter1, iter2) Similarly, we can specify the signature of ``zip`` as: :: def zip(*iterables: *Map[Iterable, ArgTs]) -> Iterable[*ArgTs]): ... l1: List[int] l2: List[str] zip(l1, l2) # Iterable[int, str] Nesting ''''''' Because the type of the result of ``Map`` is the same as the type of its second operand, the result of one ``Map`` *can* be used as the input to another ``Map``: :: Map[Tuple, *Map[Tuple, Ts]] # Valid! Accessing Individual Types -------------------------- ``Map`` allows us to operate on types in a bulk fashion. For situations where we require access to each individual type, overloads can be used with individual ``TypeVar`` instances in place of the type tuple variable: :: Shape = TypeTupleVar('Shape') Axis1 = TypeVar('Axis1') Axis2 = TypeVar('Axis2') Axis3 = TypeVar('Axis3') class Tensor(Generic[*Shape]): ... @overload class Tensor(Generic[Axis1, Axis2]): def transpose(self) -> Tensor[Axis2, Axis1]: ... @overload class Tensor(Generic[Axis1, Axis2, Axis3]): def transpose(self) -> Tensor[Axis3, Axis2, Axis1]: ... Concatenating Other Types to a Type Tuple Variable -------------------------------------------------- If a type tuple variable appears with other types in the same type parameter list, the effect is to concatenate those types with the types in the type tuple variable: :: Shape = TypeTupleVar('Shape') class Batch: pass class Height: pass class Width: pass class Tensor(Generic[*Shape]): ... def add_batch(x: Tensor[*Shape]) -> Tensor[Batch, *Shape]: ... x: Tensor[Height, Width] add_batch(x) # Inferred type is Tensor[Batch, Height, Width] Type tuple variables can also be combined with regular ``TypeVar`` instances: :: T1 = TypeVar('T1') T2 = TypeVar('T2') class Foo(Generic[T1, T2, *Ts]): ... foo: Foo[int, str, bool, float] # T1=int, T2=str, Ts=Tuple[bool, float] Concatenating Multiple Type Tuple Variables ------------------------------------------- If multiple type tuple variables appear in a parameter list, in order to prevent ambiguity about which types would be bound to which type tuple variables, the type tuple variables must not be expanded: :: # NOT allowed class Bar(Generic[*Ts1, *Ts2]): ... # How would we decide which types are bound to Ts1 # and which are bound to Ts2? bar: Bar[int, str, bool] # The right way class Bar(Generic[Ts1, Ts2]): ... bar: Bar[Tuple[int], Tuple[str, bool]] Rationale and Rejected Ideas ============================ Supporting Variadicity Through aliases -------------------------------------- As noted in the introduction, it **is** possible to avoid variadic generics by simply defining aliases for each possible number of type parameters: :: class Tensor1(Generic[Axis1]): ... class Tensor2(Generic[Axis1, Axis2]): ... However, this seems somewhat clumsy - it requires users to unnecessarily pepper their code with 1s, 2s, and so on for each rank necessary. Naming of ``Map`` ----------------- One downside to the name ``Map`` is that it might suggest a hash map. We considered a number of different options for the name of this operator. * ``ForEach``. This is rather long, and we thought might imply a side-effect. * ``Transform``. The meaning of this isn't obvious enough at first glance. * ``Apply``. This is inconsistent with ``apply``, an older Python function which enabled conversion of iterables to arguments before the star operator was introduced. In the end, we decided that ``Map`` was good enough. Naming of ``TypeTupleVar`` -------------------------- ``TypeTupleVar`` began as ``ListVariadic``, based on its naming in an early implementation in Pyre. We then changed this to ``TypeVar(list=True)``, on the basis that a) it better emphasises the similarity to ``TypeVar``, and b) the meaning of 'list' is more easily understood than the jargon of 'variadic'. We finally settled on ``TypeTupleVar`` based on the justification that c) this emphasises the tuple-like behaviour, and d) type tuple variables are a sufficiently different kind of thing to regular type variables that we may later wish to support keyword arguments to its constructor that should not be supported by regular type variables (such as ``arbitrary_len`` [#arbitrary_len]_). Accessing Individual Types Without Overloads -------------------------------------------- We chose to support access to individual types in the type tuple variable using overloads (see the `Accessing Individual Types`_ section). One alternative would have been to allow explicit access to arbitrary parts of the type tuple variable - for example, through indexing: :: def foo(t: Tuple[*Ts]): x: Ts[1] = t[1] We decided to omit this mechanism from this PEP because a) it adds complexity, b) we were not aware of any use-cases that need it, and c) if it turns out to be needed in the future, it can easily be added in a future PEP. Integer Generics ---------------- Consider a function such as `np.tile`: :: x = np.zeros((3,)) # A tensor of length 3 y = np.tile(x, reps=2) # y is now length 6 Intuitively, we would specify the signature of such a function as: :: @overload def tile(A: Tensor[N], reps: Literal[2]) -> Tensor[2*N]: ... # ...and other overloads for different values of `reps` ``N`` is *sort* of like a type variable. However, type variables stand in for *types*, whereas here we want ``N`` to stand in for a particular *value*. ``N`` should be some sort of 'integer type variable'. (Note that ``N`` could *not* be created as simply ``TypeTupleVar('N', bound=int)``. This would state that ``N`` could stand for an ``int`` or any *subtype* of ``int``. For our signature above, we would need ``N`` to stand for any *instance* of type ``int``.) We decided to omit integer type variables for this PEP, postponing it for a future PEP when necessary. Integer Parameterization ------------------------ The examples of this PEP have parameterised tensor types using the semantic meaning of each axes, e.g. ``Tensor[Batch, Time]``. However, we may also wish to parameterize using the actual integer value of each part of the shape, such as ``Tensor[Literal[64], Literal[64]]``. There are two aspects related to such integer parameterization that we decided to ignore in this PEP: **Examples of integer parameterization**. Thought it clearly *is* valid to parameterize with literal types, we wish to encourage the use of semantic labelling of tensor axes wherever possible: having each axis labelled serves as extra protection against mistakes when manipulating axes. **Syntactic sugar for integer parameterization**. Typing ``Literal`` is cumbersome; ideally, we could write ``Tensor[64, 64]`` as syntactic sugar for ``Tensor[Literal[64], Literal[64]]``. However, this would require an inconsistency: because of forward referencing, ``Tensor['Batch']`` and ``Tensor[Literal['Batch']]`` mean different things. For this to work, we would have to stipulate this sugar only applies for integers. We leave this discussion for a future PEP. (If you do wish to employ such types in your code currently, we recommend ``import Typing.Literal as L`` enabling the much shorter ``L[64]``.) Checking the Number of Types in a Variadic Generic -------------------------------------------------- Consider reduction operations, which behave as: :: x = np.zeros((2, 3, 5)) reduce_sum(x, axis=0) # Shape (3, 5) reduce_sum(x, axis=1) # Shape (2, 5) One way to compactly specify the signature of these operations would be write something like: :: Shape = TypeTupleVar('Shape') # Tensor of rank N goes in, tensor of rank N-1 comes out def reduce_sum(x: Tensor[Shape[N]], axis: int) -> Tensor[Shape[N-1]]: ... ``Shape[N]`` here states that number of types in ``Shapes`` is bound to ``N``, where ``N`` is some object that we can perform arithmetic on. Lacking an urgent use-case for this feature, we omit it from this PEP, leaving it to a future PEP if necessary. (Note that reduction operations are only used as an example here. Reduction functions can in fact be typed without this feature, using overloads: :: @overload def reduce_sum(x: Tensor[A, B], axis: Literal[0]) -> Tensor[B]: ... @overload def reduce_sum(x: Tensor[A, B], axis: Literal[1]) -> Tensor[A]: ... ... Although more verbose, typing reduction operations this way is superior to the approach above, since it preserves information about *which* axis has been removed.) Backwards Compatibility ======================= TODO * ``Tuple`` needs to be upgraded to support parameterization with a a type tuple variable. Reference Implementation ======================== TODO Footnotes ========== .. [#hkt] A third potential use is in enabling higher-kinded types that take an arbitrary number of type operands, but we do not discuss this use here. .. [#batch] 'Batch' is machine learning parlance for 'a number of'. .. [#array] We use the term 'array' to refer to a matrix with an arbitrary number of dimensions. In NumPy, the corresponding class is the ``ndarray``; in TensorFlow, the ``Tensor``; and so on. .. [#timebatch] If the shape begins with 'batch × time', then ``videos_batch[0][1]`` would select the second frame of the first video. If the shape begins with 'time × batch', then ``videos_batch[1][0]`` would select the same frame. .. [#kwargs] In the case of ``**kwargs``, we mean the Nth argument as it appears in the function *definition*, *not* the Nth keyword argument specified in the function *call*. .. [#type_aliases] For example, in ``asyncio`` [#asyncio]_, it is convenient to define a type alias ``_FutureT = Union[Future[_T], Generator[Any, None, _T], Awaitable[_T]]``. We should also be able to apply ``Map`` to alias - e.g. ``Map[_FutureT, Ts]``. References ========== .. [#pep-612] PEP 612, "Parameter Specification Variables": https://www.python.org/dev/peps/pep-0612 .. [#pep-484] PEP 484, "Type Hints": https://www.python.org/dev/peps/pep-0484 .. [#numeric-stack] Static typing of Python numeric stack: https://paper.dropbox.com/doc/Static-typing-of-Python-numeric-stack-summary-6ZQzTkgN6e0oXko8fEWwN .. [#typing-ideas] Ideas for array shape typing in Python: https://docs.google.com/document/d/1vpMse4c6DrWH5rq2tQSx3qwP_m_0lyn-Ij4WHqQqRHY/edit .. [#syntax-proposal] Shape annotation syntax proposal: https://docs.google.com/document/d/1But-hjet8-djv519HEKvBN6Ik2lW3yu0ojZo6pG9osY/edit .. [#zip-sig] ``typeshed/builtins.pyi``: https://github.com/python/typeshed/blob/27dfbf68aaffab4f1ded7dc1b96f6f82f536a09d/stdlib/2and3/builtins.pyi#L1710-L1733 .. [#asyncio] ``typeshed/asyncio/tasks.pyi``: https://github.com/python/typeshed/blob/193c7cb93283ad4ca2a65df74c565e56bfe72b7e/stdlib/3/asyncio/tasks.pyi#L45-L154 .. [#arbitrary_len] Discussion on Python typing-sig mailing list: https://mail.python.org/archives/list/typing-sig@python.org/thread/SQVTQYWIOI4TIO7NNBTFFWFMSMS2TA4J/ Acknowledgements ================ Thank you to **Alfonso Castaño**, **Antoine Pitrou**, **Bas v.B.**, **David Foster**, **Dimitris Vardoulakis**, **Guido van Rossum**, **Jia Chen**, **Lucio Fernandez-Arjona**, **Nikita Sobolev**, **Peilonrayz**, **Pradeep Kumar Srinivasan**, **Rebecca Chen**, **Sergei Lebedev** and **Vladimir Mikulik** for helpful feedback and suggestions on drafts of this PEP. Thank you especially to **Pradeep** for numerous key contributions, including pointing out that unexpanded type tuples allow for clean concatenation of multiple type tuples, and to **Lucio**, for suggesting the star syntax, which has made multiple aspects of this proposal much more concise and intuitive. Resources ========= Discussions on variadic generics in Python started in 2016 with `Issue 193`__ on the python/typing GitHub repository. __ https://github.com/python/typing/issues/193 Inspired by this discussion, **Ivan Levkivskyi** made a concrete proposal at PyCon 2019, summarised in `Type system improvements`__ and `Static typing of Python numeric stack`__. __ https://paper.dropbox.com/doc/Type-system-improvements-HHOkniMG9WcCgS0LzXZAe __ https://paper.dropbox.com/doc/Static-typing-of-Python-numeric-stack-summary-6ZQzTkgN6e0oXko8fEWwN Expanding on these ideas, **Mark Mendoza** and **Vincent Siles** gave a presentation on `Variadic Type Variables for Decorators and Tensors`__ at the 2019 Python Typing Summit. __ https://github.com/facebook/pyre-check/blob/ae85c0c6e99e3bbfc92ec55104bfdc5b9b3097b2/docs/Variadic_Type_Variables_for_Decorators_and_Tensors.pdf Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: