Single-View 3D Object Reconstruction from Shape Priors in Memory
Existing methods for single-view 3D object reconstruction directly learn to
transform image features into 3D representations. However, these methods are
vulnerable to images containing noisy backgrounds and heavy occlusions because
the extracted image features do not contain enough information to reconstruct
high-quality 3D shapes. Humans routinely use incomplete or noisy visual cues
from an image to retrieve similar 3D shapes from their memory and reconstruct
the 3D shape of an object. Inspired by this, we propose a novel method, named
Mem3D, that explicitly constructs shape priors to supplement the missing
information in the image. Specifically, the shape priors are in the forms of
"image-voxel" pairs in the memory network, which is stored by a well-designed
writing strategy during training. We also propose a voxel triplet loss function
that helps to retrieve the precise 3D shapes that are highly related to the
input image from shape priors. The LSTM-based shape encoder is introduced to
extract information from the retrieved 3D shapes, which are useful in
recovering the 3D shape of an object that is heavily occluded or in complex
environments. Experimental results demonstrate that Mem3D significantly
improves reconstruction quality and performs favorably against state-of-the-art
methods on the ShapeNet and Pix3D datasets.
Authors
Shuo Yang, Min Xu, Haozhe Xie, Stuart Perry, Jiahao Xia