Dissertation Talk: Image Synthesis for Self-Supervised Learning

Presentation | April 20 | 3-4 p.m. | 250 Sutardja Dai Hall

 Richard Zhang, UC Berkeley, Department of EECS

 Electrical Engineering and Computer Sciences (EECS)

We explore the use of deep networks for image synthesis, both as a graphics goal and as an effective method for representation learning. We propose BicycleGAN, a general system for image-to-image translation problems, with the specific aim of capturing the multimodal nature of the output space. We study image colorization in greater detail and develop automatic and user-guided approaches. Moreover, colorization, as well as cross-channel prediction in general, is a simple but powerful pretext task for self-supervised feature learning. Not only does the network solve the direct graphics task, it also learns to capture patterns in the visual world, even without the benefit of human-curated labels. We demonstrate strong transfer to high-level semantic tasks, such as image classification, and to low-level human perceptual judgments. For the latter, we collect a large-scale dataset of human similarity judgments and find that our method outperforms traditional metrics such as PSNR and SSIM. We also discover that many unsupervised and self-supervised methods transfer strongly, even comparable to fully-supervised methods.