Fastformer: An Efficient Transformer Model for Text Understanding
Fastformer: Additive Attention is All You Need
In this paper, we propose fastformer, which is an efficient model for text understanding based on additiveattention mechanism.
We first use additive attention mechanism to model global contexts, and then further transform each token representation based on its interaction with global context representations.
Extensive experiments on five datasets show that fastformer is much more efficient than many existing transformer models and can meanwhile achieve comparable or even better long text modeling performance.