L-Verse: Bidirectional Generation Between Image and Text - 42Papers