Video Generation based on Language Descriptions
Towards Real-Time Text2Video via CLIP-Guided, Pixel-Level Optimization
We introduce an approach to generating videos based on a series of given language descriptions.Frames of the video are generated sequentially andoptimized by guidance from the image-text encoder, iterating through language descriptions, weighting the current description higher than others.The approach can generate videos in up to 720p resolution, variable frame-rates, and arbitrary aspect ratios at a rate of 1-2 frames persecond at a speed suitable for nearreal-time systems.We introduce an approach to generating videos based on a series of given language descriptions.Frames of the video are generated sequentially andoptimized by guidance from the image-text encoder, iterating through language descriptions, weighting the current description higher than others.The approach can generate videos in up to 720p resolution, variable frame-rates, and arbitrary aspect ratios at a rate of 1-2 frames persecond at a speed suitable for nearreal-time systems.We introduce an approach to generating videos based on a series of given language descriptions.The approach can generate videos in up to 720p resolution, variable frame-rates, and arbitrary aspect ratios at a rate of 1-2 frames persecond at a speed suitable for nearreal-time systems.