Robustly Optimized and Distilled Training for Natural Language Understanding - 42Papers