Accelerating Neural Transformer via an Average Attention Network - 42Papers