Scientific Computing and Matrix Computations Seminar: On Matrix Sparsification and Quantization for Efficient and Scalable Deep Learning
Seminar: Scientific Computing: CS | October 10 | 11 a.m.-12 p.m. | Soda Hall
Wei Wen, Duke University
Deep Learning brings breakthroughs into computer vision, speech recognition, natural language processing and other applications. Those breakthroughs are achieved by feeding a huge amount of data into large-scale Deep Neural Networks (DNNs). The large scale brings new challenges to both DNN inference and DNN training. In the inference, matrix multiplication is the computation core, and its scale is unaffordable for real-time AI in edge devices (such as drones, mobile devices and robots). Therefore, sparse DNNs are required for real-time execution. In this talk, I will introduce our general learning algorithms, which sparsify dimensions, blocks and ranks of matrices in Deep Neural Networks for efficient inference. I will show sparsifying matrices is equivalent to removing DNN structures (including neurons, filters, channels, layers, hidden states/cells, etc). Our final compact DNNs have regular structures as traditional DNNs do, such that sparse DNNs can be directly deployed without any software or hardware tweaking. The second part of the talk will cover our recent progress on scalable distributed Deep Learning. In the DNN training, distributed systems are usually utilized to boost computing power; however, communication becomes a new speed bottleneck because of gradient synchronization. I will introduce our SGD variant (TernGrad) to overcome the bottleneck. In TernGrad, floating/32-bit gradients are stochastically quantized to only 3 levels (i.e. ternary gradients) such that each gradient needs less than 2 bits to encode the information, thereby significantly reducing communication.