Multi-Person 3D Motion Prediction with Multi-Range Transformers
We propose a novel framework for multi-person 3D motion trajectory
prediction. Our key observation is that a human's action and behaviors may
highly depend on the other persons around. Thus, instead of predicting each
human pose trajectory in isolation, we introduce a Multi-Range Transformers
model which contains of a local-range encoder for individual motion and a
global-range encoder for social interactions. The Transformer decoder then
performs prediction for each person by taking a corresponding pose as a query
which attends to both local and global-range encoder features. Our model not
only outperforms state-of-the-art methods on long-term 3D motion prediction,
but also generates diverse social interactions. More interestingly, our model
can even predict 15-person motion simultaneously by automatically dividing the
persons into different interaction groups. Project page with code is available
at this https URL
Authors
Jiashun Wang, Huazhe Xu, Medhini Narasimhan, Xiaolong Wang