This document aims to be a self-contained, mathematically precise overview of transformers architectures and algorithms.
It covers what transformers are, how they are trained, what they are used for, their keyarchitectural components, and a preview of the most prominent models.
It covers what transformers are, how they are trained, what they are used for, their keyarchitectural components, and a preview of the most prominent models.
It aims to be a self-contained, mathematically precise overview of transformers architectures and algorithms.
It covers what transformers are, how they are trained, what they are used for, their keyarchitectural components, and a preview of the most prominent models.