A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing - 42Papers