Seminar | October 2 | 3-4 p.m. | 400 Cory Hall
Amir Gholami, UC Berkeley
Model size and inference speed have become major challenged in the deployment of Neural Networks for many applications. A promising approach to address these is quantization. However, existing quantization methods use ad-hoc approaches and tricks that do not generalize to different models and require significant hand tuning. To address this, we have recently developed a new systematic approach for model compression using second order information, resulting in unprecedented small models for a range of challenging problems. I will first discuss the Hessian based algorithm, and then present results showing significant improvements for quantization of a range of modern networks including (i) ResNet50152, Inception-V3, and SqueezeNext on ImageNet, (ii) RetinaNet-ResNet50 on Microsoft COCO dataset for object detection, and (iii) BERT model for natural language processing. All results are obtained without any expensivead-hoc search, but exceed all industry level results including expensive Auto-ML based methods, which are searched at massive scale.