Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model
We introduce a new generative model called generative vision transformer (genvit), and extend it to hybrid discriminative-generative-modeling (hybrid discriminative-generative-modeling).
Our work is among the first to explore a single vit for image generation and classification jointly.
We conduct a series of experiments to analyze the performance of proposed models and demonstrate their superiority over prior state-of-the-arts in both generative and discriminative tasks.
Authors
Xiulong Yang, Sheng-Min Shih, Yinlin Fu, Xiaoting Zhao, Shihao Ji