MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning - 42Papers