We introduce a transformer-based classifier-free diffusion-based generative model for the human motion domain.
Our model is trained with lightweight resources and yet achieves state-of-the-art results on leading benchmarks for text-to-motion and action-to-motion.
Diffusion frameworks have achieved comparable performance with state-of-the-art image generation models, but researchers are curious about its variants in discriminative tasks because of its powerful noise-to-image denoising pipeline.
This paper proposes a novel framework that represents instances as instance-aware filters and formulates instance segmentation as a noise-to-filter denoising process.
Diffusion models produce images with high quality and customizability, enabling them to be used for commercial art and graphic designpurposes.
In this work, we study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated.
With wider application of deep neural networks (DNNs) in various algorithms
and frameworks, security threats have become one of the concerns. Adversarial
attacks disturb DNN-based image classifiers, i
Deep learning shows great potential in generation tasks thanks to deep latent
representation. Generative models are classes of models that can generate
observations randomly with respect to certain im
We introduce a joint diffusion model that simultaneously learns meaningful
internal representations fit for both generative and predictive tasks. Joint
machine learning models that allow synthesizing
Have you ever thought that you can be an intelligent painter? This means that
you can paint a picture with a few expected objects in mind, or with a
desirable scene. This is different from normal inpa
Recent breakthroughs in text-to-image synthesis have been driven by diffusion
models trained on billions of image-text pairs. Adapting this approach to 3D
synthesis would require large-scale datasets
We apply denoising diffusion probabilistic models to text generation in image captioning tasks.
We show that our non-autoregressive text decoder is capable of generating image captions using significantly fewer inference steps than autoregressive models.
The goal of this paper is to augment a pre-trained text-to-image diffusion
model with the ability of open-vocabulary objects grounding, i.e.,
simultaneously generating images and segmentation masks fo
Classifier guidance is a recently introduced method to trade off mode
coverage and sample fidelity in conditional diffusion models post training, in
the same spirit as low temperature sampling or trun
Diffusion models (DMs) have recently emerged as a promising method in image
synthesis. They have surpassed generative adversarial networks (GANs) in both
diversity and quality, and have achieved impre
This paper provides the first sample complexity lower bounds for the
estimation of simple diffusion models, including the Bass model (used in
modeling consumer adoption) and the SIR model (used in mod
Denoising diffusion models hold great promise for generating diverse and
realistic human motions. However, existing motion diffusion models largely
disregard the laws of physics in the diffusion proce
Diffusion processes are important in several physical, chemical, biological
and human phenomena. Examples include molecular encounters in reactions,
cellular signalling, the foraging of animals, the s