PEP 646: Add some broader context (#1904)
These sections were requested in some feedback we got when talking to NumPy and JAX folks. OK, this does make an already-long PEP even longer, but I do think this context is important - especially for folks who don't have strong intuitions for how parametric typing works in Python.
This commit is contained in:
parent
051b07cced
commit
5578160afa
206
pep-0646.rst
206
pep-0646.rst
|
@ -610,6 +610,16 @@ manipulation mechanisms. We plan to introduce these in a future PEP.)
|
||||||
Rationale and Rejected Ideas
|
Rationale and Rejected Ideas
|
||||||
============================
|
============================
|
||||||
|
|
||||||
|
Shape Arithmetic
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Considering the use case of array shapes in particular, note that as of
|
||||||
|
this PEP, it is not yet possible to describe arithmetic transformations
|
||||||
|
of array dimensions - for example,
|
||||||
|
``def repeat_each_element(x: Array[N]) -> Array[2*N]``. We consider
|
||||||
|
this out-of-scope for the current PEP, but plan to propose additional
|
||||||
|
mechanisms that *will* enable this in a future PEP.
|
||||||
|
|
||||||
Supporting Variadicity Through Aliases
|
Supporting Variadicity Through Aliases
|
||||||
--------------------------------------
|
--------------------------------------
|
||||||
|
|
||||||
|
@ -768,11 +778,205 @@ is available in `cpython/23527`_. A preliminary version of the version
|
||||||
using the star operator, based on an early implementation of PEP 637,
|
using the star operator, based on an early implementation of PEP 637,
|
||||||
is also available at `mrahtz/cpython/pep637+646`_.
|
is also available at `mrahtz/cpython/pep637+646`_.
|
||||||
|
|
||||||
|
Appendix A: Shape Typing Use Cases
|
||||||
|
==================================
|
||||||
|
|
||||||
|
To give this PEP additional context for those particularly interested in the
|
||||||
|
array typing use case, in this appendix we expand on the different ways
|
||||||
|
this PEP can be used for specifying shape-based subtypes.
|
||||||
|
|
||||||
|
Use Case 1: Specifying Shape Values
|
||||||
|
-----------------------------------
|
||||||
|
|
||||||
|
The simplest way to parameterise array types is using ``Literal``
|
||||||
|
type parameters - e.g. ``Array[Literal[64], Literal[64]]``.
|
||||||
|
|
||||||
|
We can attach names to each parameter using normal type variables:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
K = TypeVar('K')
|
||||||
|
N = TypeVar('N')
|
||||||
|
|
||||||
|
def matrix_vector_multiply(x: Array[K, N], Array[N]) -> Array[K]: ...
|
||||||
|
|
||||||
|
a: Array[Literal[64], Literal[32]]
|
||||||
|
b: Array[Literal[32]]
|
||||||
|
matrix_vector_multiply(a, b)
|
||||||
|
# Result is Array[Literal[64]]
|
||||||
|
|
||||||
|
Note that such names have a purely local scope. That is, the name
|
||||||
|
``K`` is bound to ``Literal[64]`` only within ``matrix_vector_multiply``. To put it another
|
||||||
|
way, there's no relationship between the value of ``K`` in different
|
||||||
|
signatures. This is important: it would be inconvenient if every axis named ``K``
|
||||||
|
were constrained to have the same value throughout the entire program.
|
||||||
|
|
||||||
|
The disadvantage of this approach is that we have no ability to enforce shape semantics across
|
||||||
|
different calls. For example, we can't address the problem mentioned in `Motivation`_: if
|
||||||
|
one function returns an array with leading dimensions 'Time × Batch', and another function
|
||||||
|
takes the same array assuming leading dimensions 'Batch × Time', we have no way of detecting this.
|
||||||
|
|
||||||
|
The main advantage is that in some cases, axis sizes really are what we care about. This is true
|
||||||
|
for both simple linear algebra operations such as the matrix manipulations above, but also in more
|
||||||
|
complicated transformations such as convolutional layers in neural networks, where it would be of
|
||||||
|
great utility to the programmer to be able to inspect the array size after each layer using
|
||||||
|
static analysis. To aid this, in the future we would like to explore possibilities for additional
|
||||||
|
type operators that enable arithmetic on array shapes - for example:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
def repeat_each_element(x: Array[N]) -> Array[Mul[2, N]]: ...
|
||||||
|
|
||||||
|
Such arithmetic type operators would only make sense if names such as ``N`` refer to axis size.
|
||||||
|
|
||||||
|
Use Case 2: Specifying Shape Semantics
|
||||||
|
--------------------------------------
|
||||||
|
|
||||||
|
A second approach (the one that most of the examples in this PEP are based around)
|
||||||
|
is to forgo annotation with actual axis size, and instead annotate axis *type*.
|
||||||
|
|
||||||
|
This would enable us to solve the problem of enforcing shape properties across calls.
|
||||||
|
For example:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
# lib.py
|
||||||
|
|
||||||
|
class Batch: pass
|
||||||
|
class Time: pass
|
||||||
|
|
||||||
|
def make_array() -> Array[Batch, Time]: ...
|
||||||
|
|
||||||
|
# user.py
|
||||||
|
|
||||||
|
from lib import Batch, Time
|
||||||
|
|
||||||
|
# `Batch` and `Time` have the same identity as in `lib`,
|
||||||
|
# so must take array as produced by `lib.make_array`
|
||||||
|
def use_array(x: Array[Batch, Time]): ...
|
||||||
|
|
||||||
|
Note that in this case, names are *global* (to the extent that we use the
|
||||||
|
same ``Batch`` type in different place). However, because names refer only
|
||||||
|
to axis *types*, this doesn't constrain the *value* of certain axes to be
|
||||||
|
the same through (that is, this doesn't constrain all axes named ``Height``
|
||||||
|
to have a value of, say, 480 throughout).
|
||||||
|
|
||||||
|
The argument *for* this approach is that in many cases, axis *type* is the more
|
||||||
|
important thing to verify; we care more about which axis is which than what the
|
||||||
|
specific size of each axis is.
|
||||||
|
|
||||||
|
It also does not preclude cases where we wish to describe shape transformations
|
||||||
|
without knowing the type ahead of time. For example, we can still write:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
K = TypeVar('K')
|
||||||
|
N = TypeVar('N')
|
||||||
|
|
||||||
|
def matrix_vector_multiply(x: Array[K, N], Array[N]) -> Array[K]: ...
|
||||||
|
|
||||||
|
We can then use this with:
|
||||||
|
|
||||||
|
class Batch: pass
|
||||||
|
class Values: pass
|
||||||
|
|
||||||
|
batch_of_values: Array[Batch, Values]
|
||||||
|
value_weights: Array[Values]
|
||||||
|
matrix_vector_multiply(batch_of_values, value_weights)
|
||||||
|
# Result is Array[Batch]
|
||||||
|
|
||||||
|
The disadvantages are the inverse of the advantages from use case 1.
|
||||||
|
In particular, this approach does not lend itself well to arithmetic
|
||||||
|
on axis types: ``Mul[2, Batch]`` would be as meaningless as ``2 * int``.
|
||||||
|
|
||||||
|
Discussion
|
||||||
|
----------
|
||||||
|
|
||||||
|
Note that use cases 1 and 2 are mutually exclusive in user code. Users
|
||||||
|
can verify size or semantic type but not both.
|
||||||
|
|
||||||
|
As of this PEP, we are agnostic about which approach will provide most benefit.
|
||||||
|
Since the features introduced in this PEP are compatible with both approaches, however,
|
||||||
|
we leave the door open.
|
||||||
|
|
||||||
|
Why Not Both?
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Consider the following 'normal' code:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
def f(x: int): ...
|
||||||
|
|
||||||
|
Note that we have symbols for both the value of the thing (``x``) and the type of
|
||||||
|
the thing (``int``). Why can't we do the same with axes? For example, with an imaginary
|
||||||
|
syntax, we could write:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
def f(array: Array[TimeValue: TimeType]): ...
|
||||||
|
|
||||||
|
This would allow us to access the axis size (say, 32) through the symbol ``TimeValue``
|
||||||
|
*and* the type through the symbol ``TypeType``.
|
||||||
|
|
||||||
|
This might even be possible using existing syntax, through a second level of parameterisation:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
def f(array: array[TimeValue[TimeType]]): ..
|
||||||
|
|
||||||
|
However, we leave exploration of this approach to the future.
|
||||||
|
|
||||||
|
Appendix B: Shaped Types vs Named Axes
|
||||||
|
======================================
|
||||||
|
|
||||||
|
An issue related to those addressed by this PEP concerns
|
||||||
|
axis *selection*. For example, if we have an image stored in an array of shape 64×64x3,
|
||||||
|
we might wish to convert to black-and-white by computing the mean over the third
|
||||||
|
axis, ``mean(image, axis=2)``. Unfortunately, the simple typo ``axis=1`` is
|
||||||
|
difficult to spot and will produce a result that means something completely different
|
||||||
|
(all while likely allowing the program to keep on running, resulting in a bug
|
||||||
|
that is serious but silent).
|
||||||
|
|
||||||
|
In response, some libraries have implemented so-called 'named tensors' (in this context,
|
||||||
|
'tensor' is synonymous with 'array'), in which axes are selected not by index but by
|
||||||
|
label - e.g. ``mean(image, axis='channels')``.
|
||||||
|
|
||||||
|
A question we are often asked about this PEP is: why not just use named tensors?
|
||||||
|
The answer is that we consider the named tensors approach insufficient, for two main reasons:
|
||||||
|
|
||||||
|
* **Static checking** of shape correctness is not possible. As mentioned in `Motivation`_,
|
||||||
|
this is a highly desireable feature in machine learning code where iteration times
|
||||||
|
are slow by default.
|
||||||
|
* **Interface documentation** is still not possible with this approach. If a function should
|
||||||
|
*only* be willing to take array arguments that have image-like shapes, this cannot be stipulated
|
||||||
|
with named tensors.
|
||||||
|
|
||||||
|
Additionally, there's the issue of **poor uptake**. At the time of writing, named tensors
|
||||||
|
have only been implemented in a small number of numerical computing libraries. Possible explanations for this
|
||||||
|
include difficulty of implementation (the whole API must be modified to allow selection by axis name
|
||||||
|
instead of index), and lack of usefulness due to the fact that axis ordering conventions are often
|
||||||
|
strong enough that axis names provide little benefit (e.g. when working with images, 3D tensors are
|
||||||
|
basically *always* height × width × channels). However, ultimately we are still uncertain
|
||||||
|
why this is the case.
|
||||||
|
|
||||||
|
Can the named tensors approach be combined with the approach we advocate for in
|
||||||
|
this PEP? We're not sure. One area of overlap is that in some contexts, we could do, say:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
Image: Array[Height, Width, Channels]
|
||||||
|
im: Image
|
||||||
|
mean(im, axis=Image.axes.index(Channels)
|
||||||
|
|
||||||
|
Ideally, we might write something like ``im: Array[Height=64, Width=64, Channels=3]`` -
|
||||||
|
but this won't be possible in the short term, due to the rejection of PEP 637.
|
||||||
|
In any case, our attitude towards this is mostly "Wait and see what happens before
|
||||||
|
taking any further steps".
|
||||||
|
|
||||||
Footnotes
|
Footnotes
|
||||||
==========
|
==========
|
||||||
|
|
||||||
|
|
||||||
.. [#batch] 'Batch' is machine learning parlance for 'a number of'.
|
.. [#batch] 'Batch' is machine learning parlance for 'a number of'.
|
||||||
|
|
||||||
.. [#array] We use the term 'array' to refer to a matrix with an arbitrary
|
.. [#array] We use the term 'array' to refer to a matrix with an arbitrary
|
||||||
|
|
Loading…
Reference in New Issue