Temporally varying volumetric image-based rendering for dynamic scene synthesis
DynIBaR: Neural Dynamic Image-Based Rendering
We address the problem of synthesizing novel views from a monocular videodepicting a complex dynamic scene by adopting a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views in a scene-motion-aware manner.
Our system retains the advantages of prior methods in its ability to model complex scenes and view-dependent effects, but also enables synthesizing photo-realistic novel views from long videos featuring complex scene dynamics with unconstrained camera trajectories.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets, and also apply our approach to in-the-wild videos with challenging camera and object motion, where prior methods fail to produce high-quality renderings.
Authors
Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, Noah Snavely
Computer vision methods can now produce reconstructions of static 3d scenes that allow for renderings with spectacular quality.
Novel view synthesis from a monocular video of dynamic scene is a much more challenging scene reconstruction problem.
In this work, we present a novel approach that is scalable to dynamic scenes captured with 1) long time duration, 2) unconstrained scene configurations, 3) uncontrolled camera trajectories, and 4) fast and complex object motion.
We take inspiration from recent work on rendering of static scenes in which images are synthesized by aggregating local image features along epipolar lines from nearby views through ray transformers.
However, scenes that are in motion violate the epipolar constraints assumed by those methods.
We propose to aggregate multi-view image features in ray space, which allows us to correctly reason about spatio-temporally varying scene geometry and appearance.
To efficiently model scene motion across multiple views, we model this motion using that span multiple frames, represented with learned basis functions.
Furthermore, to achieve temporal coherence of the dynamic scene reconstruction, we introduce a new temporal photometric loss that operates in motion-adjusted ray space.
Our approach retains the advantages of volumetric scene representations that can model intricate scene geometry with view-dependent effects, while significantly improving rendering fidelity of both static and dynamic scene contents compared to recent methods.
Result
We present a new approach for space-time view synthesis from a monocular video depicting a complex dynamic scene.
We show that our method can synthesize photo-realistic novel views from in-the-wild dynamic videos, and can achieve significant improvements over prior state-of-the-art methods on the dynamic scene benchmarks.