8-bit approximation of softmax layer in multi-layer perceptron oral neural networks
Efficient Softmax Approximation for Deep Neural Networks with Attention Mechanism
In this paper, we propose two methods to approximate softmax computation, which are based on the usage of lookup tables (luts).
The required size of lut is quite small (about 700 bytes) because ranges of numerators and denominators of softmax are stable if normalization is applied to the input.
We have validated the proposed technique over different artificial intelligence tasks (object detection, machine translation, sentiment analysis, and semantic equivalence) and deep neural networks models by a variety of benchmarks (coco17,wmt14,wmt17,glue).