Self-conditioned Embedding Diffusion for Continuous Text Generation
Self-conditioned Embedding Diffusion for Text Generation
Comparison of sample quality and diversity of
Continuous diffusion models for natural language can bring the same performance breakthrough on natural language they did for image generation.
To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling.
We propose self-conditioned embedding diffusion, a continuous diffusion mechanism that operates on tokenembeddings and allows to learn flexible and scalable diffusion models for both conditional and unconditional text generation.
Authors
Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre, Rémi Leblond
Continuous diffusion models have taken the world of image generation by storm, advancing the state of the art further than ever before.
We introduce, the first continuous diffusion approach for text with good scaling properties (testing models up to 420m parameters).
We analyze several continuous text diffusion settings, and identify self-conditioning and diffusion on small fixed embeddings as key factors to make continuous text diffusion work.
We show that can rival autoregressive (ar) models on generic language tasks, for similar models sizes.
Samples achieve a better likelihood-entropy trade-off compared to these models, and are deemed comparable (if slightly worse) by human raters.
Result
We propose, the first generally-capable continuous diffusion model for text generation.
Models can perform both conditional and unconditional generation, and their performance rivals augmented reality models while being more flexible in their use (e.g.
Enabling in-filling).
We demonstrate their performance and study the impact of the main design choices.