InverseRenderNet: Learning single image inverse rendering

Abstract

We show how to train a fully convolutional neural network to perform inverse rendering from a single, uncontrolled image. The network takes an RGB image as input, regresses albedo and normal maps from which we compute lighting coefficients. Our network is trained using large uncontrolled image collections without ground truth. By incorporating a differentiable renderer, our network can learn from self-supervision. Since the problem is ill-posed we introduce additional supervision: 1. We learn a statistical natural illumination prior, 2. Our key insight is to perform offline multiview stereo (MVS) on images containing rich illumination variation. From the MVS pose and depth maps, we can cross project between overlapping views such that Siamese training can be used to ensure consistent estimation of photometric invariants. MVS depth also provides direct coarse supervision for normal map estimation. We believe this is the first attempt to use MVS supervision for learning inverse rendering.

Publication
In IEEE Conference on Computer Vision and Pattern Recognition 2019
From a single image (col. 1), we estimate albedo and normal maps and illumination (col. 2-4); comparison multiview stereo result from several hundred images (col. 5); re-rendering of our shape with frontal/estimated lighting (col. 6-7).
Avatar
Ye Yu
Research Engineer
Avatar
Will Smith
Professor in Computer Vision