We propose a notation for tensors with named axes, which relieves the author, reader, and future implementers from the burden of keeping track of the orderof axes and the purpose of each.
It also makes it easy to extend operations on low-order tensors to higher order ones (e.g., to extend an operation on images to minibatches of images, or extend the attention mechanism to multiple attention heads).
After a brief overview of our notation, we illustrate it through several examples from modern machine learning, from building blockslike attention and convolution to full models like transformers and lenet.