Dissertation Talk: Avoiding Communication in First-Order Methods for Optimization
Presentation | April 25 | 1-2 p.m. | Soda Hall, 306 (HP Auditorium)
On modern parallel computing architectures the movement of data between processors over a network is often the most expensive operation in terms of both time and energy. Recent work in numerical linear algebra has shown that existing algorithms can be re-formulated to avoid communication and achieve large sequential and parallel speedups over existing approaches.
Parallel computing has played an important role in speeding up optimization methods for big data analytics and large-scale machine learning (ML). However, the scalability of these optimization methods is limited by the cost of communicating and synchronizing processors. Iterative ML methods are particularly sensitive to communication cost since they often require communication every iteration.
This work extends well-known techniques from Communication-Avoiding Krylov subspace methods to first-order, block coordinate descent (BCD) methods. We derive communication-avoiding variants of BCD methods that solve popular regression, classification, and related kernel problems. These variants reduce the latency cost by a tunable factor of 's' at the expense of a factor of 's' increase in flops and bandwidth costs. We show that these variants are numerically stable and can attain large speedups of up to 6.1x on a Cray XC30 supercomputer.