We present an approach to enhancing the realism of synthetic images. The
images are enhanced by a convolutional network that leverages intermediate
representations produced by conventional rendering pipelines. The network is
trained via a novel adversarial objective, which provides strong supervision at
multiple perceptual levels. We analyze scene layout distributions in commonly
used datasets and find that they differ in important ways. We hypothesize that
this is one of the causes of strong artifacts that can be observed in the
results of many prior methods. To address this we propose a new strategy for
sampling image patches during training. We also introduce multiple
architectural improvements in the deep network modules used for photorealism
enhancement. We confirm the benefits of our contributions in controlled
experiments and report substantial gains in stability and realism in comparison
to recent image-to-image translation methods and a variety of other baselines.
Authors
Stephan R. Richter, Hassan Abu AlHaija, Vladlen Koltun