MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
Human motion modeling is important for many modern graphics applications,
which typically require professional skills. In order to remove the skill
barriers for laymen, recent motion generation methods can directly generate
human motions conditioned on natural languages. However, it remains challenging
to achieve diverse and fine-grained motion generation with various text inputs.
To address this problem, we propose MotionDiffuse, the first diffusion
model-based text-driven motion generation framework, which demonstrates several
desired properties over existing methods. 1) Probabilistic Mapping. Instead of
a deterministic language-motion mapping, MotionDiffuse generates motions
through a series of denoising steps in which variations are injected. 2)
Realistic Synthesis. MotionDiffuse excels at modeling complicated data
distribution and generating vivid motion sequences. 3) Multi-Level
Manipulation. MotionDiffuse responds to fine-grained instructions on body
parts, and arbitrary-length motion synthesis with time-varied text prompts. Our
experiments show MotionDiffuse outperforms existing SoTA methods by convincing
margins on text-driven motion generation and action-conditioned motion
generation. A qualitative analysis further demonstrates MotionDiffuse's
controllability for comprehensive motion generation. Homepage:
this https URL