GANsformer: A Multiplicative Transformer for Visual generative Modeling
Generative Adversarial Transformers
We introduce the gansformer, a novel and efficient type of transformer and explore it for the task of visual generative modeling.
The network iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes.
The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis.
We demonstrate the model s strength and robustness through a careful evaluation over a range of datasets, from simulated multi-object environments to rich real-world indoor and outdoor scenes, showing it achieves state-of-the-art results in terms of image quality and diversity, while enjoying fast learning and better data-efficiency.