HSPACE: Synthetic Parametric Humans Animated in Complex Environments
Advances in the state of the art for 3d human sensing are currently limited
by the lack of visual datasets with 3d ground truth, including multiple people,
in motion, operating in real-world environments, with complex illumination or
occlusion, and potentially observed by a moving camera. Sophisticated scene
understanding would require estimating human pose and shape as well as
gestures, towards representations that ultimately combine useful metric and
behavioral signals with free-viewpoint photo-realistic visualisation
capabilities. To sustain progress, we build a large-scale photo-realistic
dataset, Human-SPACE (HSPACE), of animated humans placed in complex synthetic
indoor and outdoor environments. We combine a hundred diverse individuals of
varying ages, gender, proportions, and ethnicity, with hundreds of motions and
scenes, as well as parametric variations in body shape (for a total of 1,600
different humans), in order to generate an initial dataset of over 1 million
frames. Human animations are obtained by fitting an expressive human body
model, GHUM, to single scans of people, followed by novel re-targeting and
positioning procedures that support the realistic animation of dressed humans,
statistical variation of body proportions, and jointly consistent scene
placement of multiple moving people. Assets are generated automatically, at
scale, and are compatible with existing real time rendering and game engines.
The dataset with evaluation server will be made available for research. Our
large-scale analysis of the impact of synthetic data, in connection with real
data and weak supervision, underlines the considerable potential for continuing
quality improvements and limiting the sim-to-real gap, in this practical
setting, in connection with increased model capacity.