Multi-View Metrics for Dynamic View Synthesis from Monocular Video
Monocular Dynamic View Synthesis: A Reality Check
We study the recent progress on dynamic view synthesis (dvs) from monocular video.
Existing approaches have demonstrated impressive results, but we show a discrepancy between the practical capture process and the existingexperimental protocols, which effectively leaks in multi-view signals during training.
We introduce two new metrics : co-visibilitymasked image metrics and correspondence accuracy, which overcome the issue in existing protocols.
We define effective multi-view factors (emfs) to quantify the amountof multi-view signal present in the input capture sequence based on the relative camera-scene motion.
We also propose a new dataset that includes morediverse real-life deformation sequences.
Using our proposed experimental protocol, we show that the state-of-the-art approaches observe a 1-2 db drop in masked position signal-to - noise ratio (psnr) in the absence of multi-view cues and 4-5 db drop when modeling complex motion.
Authors
Hang Gao, Ruilong Li, Shubham Tulsiani, Bryan Russell, Angjoo Kanazawa