Single Image 3D Object Estimation with Primitive Graph Networks
Reconstructing 3D object from a single image (RGB or depth) is a fundamental
problem in visual scene understanding and yet remains challenging due to its
ill-posed nature and complexity in real-world scenes. To address those
challenges, we adopt a primitive-based representation for 3D object, and
propose a two-stage graph network for primitive-based 3D object estimation,
which consists of a sequential proposal module and a graph reasoning module.
Given a 2D image, our proposal module first generates a sequence of 3D
primitives from input image with local feature attention. Then the graph
reasoning module performs joint reasoning on a primitive graph to capture the
global shape context for each primitive. Such a framework is capable of taking
into account rich geometry and semantic constraints during 3D structure
recovery, producing 3D objects with more coherent structure even under
challenging viewing conditions. We train the entire graph neural network in a
stage-wise strategy and evaluate it on three benchmarks: Pix3D, ModelNet and
NYU Depth V2. Extensive experiments show that our approach outperforms the
previous state of the arts with a considerable margin.