Wed Apr 19 2023
Tue Apr 18 2023

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Latent Diffusion Models
Video generation
Simulation of in-the-wild driving data
Creative content creation with text-to-video modeling
Personalized text-to-video generation

This paper discusses how Latent Diffusion Models (LDMs) can be utilized for high-resolution video generation by pre-training an LDM on images only and introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences to generate videos. The approach was validated on real driving videos of resolution 512 x 1024 and was able to achieve state-of-the-art performance. The generated temporal layers can generalize to different fine-tuned text-to-image LDMs, allowing for personalized text-to-video generation.

Businesses can use this research to improve their video generation capabilities and enhance customer engagement through personalized video content.

Generative Disco: Text-to-Video Generation for Music Visualization

Text-to-image models
Video generation
Music visualization

This paper discusses Generative Disco, a generative AI system that helps generate music visualizations with large language models and text-to-image models. The system allows users to select intervals of music for visualization and parameterize that visualization by defining start and end prompts, which are then generated based on the beat of the music for audioreactive video. Design patterns for improving generated videos are also introduced. A study with professionals showed that the system was enjoyable, easy to explore, and highly expressive.

Businesses in the music industry can use this research to improve their music visualization capabilities and enhance customer engagement through visually appealing content.

Text2Performer: Text-Driven Human Video Generation

Diffusion-based motion sampler
Video generation
Human-centric video generation

This paper presents Text2Performer, a system that generates human videos with articulated motions from texts describing the appearance and movements of a target performer. The system utilizes decomposed human representation and diffusion-based motion sampler to maintain appearance and generate continuous pose embeddings for better motion modeling. The paper also introduces a Fashion-Text2Video dataset with manually annotated action labels and text descriptions. Results show that Text2Performer generates high-quality human videos with diverse appearances and flexible motions.

Businesses can use this research to improve their video generation capabilities and enhance customer engagement through visually appealing and personalized content.

Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model

Motion tracking
Computer vision
Machine learning
AR/VR applications

This paper presents AGRoL, a novel conditional diffusion model specifically designed to track full bodies given sparse upper-body tracking signals. It can predict accurate and smooth full-body motion, particularly the challenging lower body movement. The model outperforms state-of-the-art methods in generated motion accuracy and smoothness.

The AGRoL model can improve the realism and accuracy of 3D full-body avatars for AR/VR applications, making it a highly demanded feature. It can be useful for businesses that develop AR/VR solutions.

Hyperbolic Image-Text Representations

Multi-modal data
Computer vision
Natural language processing

MERU is a contrastive model that yields hyperbolic representations of images and text, capturing the underlying hierarchy in image-text data. The model learns a highly interpretable representation space while being competitive with CLIP's performance on multi-modal tasks like image classification and image-text retrieval.

MERU can improve the interpretability of visual and linguistic concepts by explicitly capturing the hierarchy in image-text data, making it useful for businesses that work with large-scale vision and language models.

Mon Apr 17 2023
Sun Apr 16 2023
Thu Apr 13 2023
Wed Apr 12 2023