Statistical Inference for Stochastic Approximation and Online Learning via Hierarchical Incremental Gradient Descent
Seminar | December 6 | 4-5 p.m. | 1011 Evans Hall
Weijie Su, University of Pennsylvania
Stochastic gradient descent (SGD) is an immensely popular approach for optimization in settings where data arrives in a stream or data sizes are very large. Despite an ever-increasing volume of works on SGD, less is known about statistical inferential properties of predictions based on SGD solutions. In this paper, we introduce a novel procedure termed HiGrad to conduct inference on predictions, without incurring additional computational cost compared with the vanilla SGD. HiGrad begins by performing SGD iterations for a while and then split the single thread into a few, and it hierarchically operates in this fashion along each thread. With predictions provided by multiple threads in place, a t-based confidence interval is constructed by decorrelating predictions using covariance structures given by the Ruppert Polyak averaging scheme. Under certain regularity conditions, the HiGrad confidence interval is shown to attain asymptotically exact coverage probability. Finally, the performance of HiGrad is evaluated through extensive simulation studies and a real data example.
This is joint work with Yuancheng Zhu.