PEP 646: Add some broader context (#1904)

These sections were requested in some feedback we got when talking to NumPy and JAX folks. OK, this does make an already-long PEP even longer, but I do think this context is important - especially for folks who don't have strong intuitions for how parametric typing works in Python.
This commit is contained in:
Matthew Rahtz 2021-04-18 19:59:20 +01:00 committed by GitHub
parent 051b07cced
commit 5578160afa
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 205 additions and 1 deletions

View File

@ -610,6 +610,16 @@ manipulation mechanisms. We plan to introduce these in a future PEP.)
Rationale and Rejected Ideas
============================
Shape Arithmetic
----------------
Considering the use case of array shapes in particular, note that as of
this PEP, it is not yet possible to describe arithmetic transformations
of array dimensions - for example,
``def repeat_each_element(x: Array[N]) -> Array[2*N]``. We consider
this out-of-scope for the current PEP, but plan to propose additional
mechanisms that *will* enable this in a future PEP.
Supporting Variadicity Through Aliases
--------------------------------------
@ -768,11 +778,205 @@ is available in `cpython/23527`_. A preliminary version of the version
using the star operator, based on an early implementation of PEP 637,
is also available at `mrahtz/cpython/pep637+646`_.
Appendix A: Shape Typing Use Cases
==================================
To give this PEP additional context for those particularly interested in the
array typing use case, in this appendix we expand on the different ways
this PEP can be used for specifying shape-based subtypes.
Use Case 1: Specifying Shape Values
-----------------------------------
The simplest way to parameterise array types is using ``Literal``
type parameters - e.g. ``Array[Literal[64], Literal[64]]``.
We can attach names to each parameter using normal type variables:
::
K = TypeVar('K')
N = TypeVar('N')
def matrix_vector_multiply(x: Array[K, N], Array[N]) -> Array[K]: ...
a: Array[Literal[64], Literal[32]]
b: Array[Literal[32]]
matrix_vector_multiply(a, b)
# Result is Array[Literal[64]]
Note that such names have a purely local scope. That is, the name
``K`` is bound to ``Literal[64]`` only within ``matrix_vector_multiply``. To put it another
way, there's no relationship between the value of ``K`` in different
signatures. This is important: it would be inconvenient if every axis named ``K``
were constrained to have the same value throughout the entire program.
The disadvantage of this approach is that we have no ability to enforce shape semantics across
different calls. For example, we can't address the problem mentioned in `Motivation`_: if
one function returns an array with leading dimensions 'Time × Batch', and another function
takes the same array assuming leading dimensions 'Batch × Time', we have no way of detecting this.
The main advantage is that in some cases, axis sizes really are what we care about. This is true
for both simple linear algebra operations such as the matrix manipulations above, but also in more
complicated transformations such as convolutional layers in neural networks, where it would be of
great utility to the programmer to be able to inspect the array size after each layer using
static analysis. To aid this, in the future we would like to explore possibilities for additional
type operators that enable arithmetic on array shapes - for example:
::
def repeat_each_element(x: Array[N]) -> Array[Mul[2, N]]: ...
Such arithmetic type operators would only make sense if names such as ``N`` refer to axis size.
Use Case 2: Specifying Shape Semantics
--------------------------------------
A second approach (the one that most of the examples in this PEP are based around)
is to forgo annotation with actual axis size, and instead annotate axis *type*.
This would enable us to solve the problem of enforcing shape properties across calls.
For example:
::
# lib.py
class Batch: pass
class Time: pass
def make_array() -> Array[Batch, Time]: ...
# user.py
from lib import Batch, Time
# `Batch` and `Time` have the same identity as in `lib`,
# so must take array as produced by `lib.make_array`
def use_array(x: Array[Batch, Time]): ...
Note that in this case, names are *global* (to the extent that we use the
same ``Batch`` type in different place). However, because names refer only
to axis *types*, this doesn't constrain the *value* of certain axes to be
the same through (that is, this doesn't constrain all axes named ``Height``
to have a value of, say, 480 throughout).
The argument *for* this approach is that in many cases, axis *type* is the more
important thing to verify; we care more about which axis is which than what the
specific size of each axis is.
It also does not preclude cases where we wish to describe shape transformations
without knowing the type ahead of time. For example, we can still write:
::
K = TypeVar('K')
N = TypeVar('N')
def matrix_vector_multiply(x: Array[K, N], Array[N]) -> Array[K]: ...
We can then use this with:
class Batch: pass
class Values: pass
batch_of_values: Array[Batch, Values]
value_weights: Array[Values]
matrix_vector_multiply(batch_of_values, value_weights)
# Result is Array[Batch]
The disadvantages are the inverse of the advantages from use case 1.
In particular, this approach does not lend itself well to arithmetic
on axis types: ``Mul[2, Batch]`` would be as meaningless as ``2 * int``.
Discussion
----------
Note that use cases 1 and 2 are mutually exclusive in user code. Users
can verify size or semantic type but not both.
As of this PEP, we are agnostic about which approach will provide most benefit.
Since the features introduced in this PEP are compatible with both approaches, however,
we leave the door open.
Why Not Both?
-------------
Consider the following 'normal' code:
::
def f(x: int): ...
Note that we have symbols for both the value of the thing (``x``) and the type of
the thing (``int``). Why can't we do the same with axes? For example, with an imaginary
syntax, we could write:
::
def f(array: Array[TimeValue: TimeType]): ...
This would allow us to access the axis size (say, 32) through the symbol ``TimeValue``
*and* the type through the symbol ``TypeType``.
This might even be possible using existing syntax, through a second level of parameterisation:
::
def f(array: array[TimeValue[TimeType]]): ..
However, we leave exploration of this approach to the future.
Appendix B: Shaped Types vs Named Axes
======================================
An issue related to those addressed by this PEP concerns
axis *selection*. For example, if we have an image stored in an array of shape 64×64x3,
we might wish to convert to black-and-white by computing the mean over the third
axis, ``mean(image, axis=2)``. Unfortunately, the simple typo ``axis=1`` is
difficult to spot and will produce a result that means something completely different
(all while likely allowing the program to keep on running, resulting in a bug
that is serious but silent).
In response, some libraries have implemented so-called 'named tensors' (in this context,
'tensor' is synonymous with 'array'), in which axes are selected not by index but by
label - e.g. ``mean(image, axis='channels')``.
A question we are often asked about this PEP is: why not just use named tensors?
The answer is that we consider the named tensors approach insufficient, for two main reasons:
* **Static checking** of shape correctness is not possible. As mentioned in `Motivation`_,
this is a highly desireable feature in machine learning code where iteration times
are slow by default.
* **Interface documentation** is still not possible with this approach. If a function should
*only* be willing to take array arguments that have image-like shapes, this cannot be stipulated
with named tensors.
Additionally, there's the issue of **poor uptake**. At the time of writing, named tensors
have only been implemented in a small number of numerical computing libraries. Possible explanations for this
include difficulty of implementation (the whole API must be modified to allow selection by axis name
instead of index), and lack of usefulness due to the fact that axis ordering conventions are often
strong enough that axis names provide little benefit (e.g. when working with images, 3D tensors are
basically *always* height × width × channels). However, ultimately we are still uncertain
why this is the case.
Can the named tensors approach be combined with the approach we advocate for in
this PEP? We're not sure. One area of overlap is that in some contexts, we could do, say:
::
Image: Array[Height, Width, Channels]
im: Image
mean(im, axis=Image.axes.index(Channels)
Ideally, we might write something like ``im: Array[Height=64, Width=64, Channels=3]`` -
but this won't be possible in the short term, due to the rejection of PEP 637.
In any case, our attitude towards this is mostly "Wait and see what happens before
taking any further steps".
Footnotes
==========
.. [#batch] 'Batch' is machine learning parlance for 'a number of'.
.. [#array] We use the term 'array' to refer to a matrix with an arbitrary