Primer: Searching for Efficient Transformers for Language Modeling - 42Papers