Renewable Estimation and Incremental Inference in Generalized Linear Models with Streaming Data

Seminar | April 10 | 4-5 p.m. | 1011 Evans Hall

 Peter Song, University of Michigan

 Department of Statistics

I will present a new statistical paradigm for the analysis of streaming data based on renewable estimation and incremental inference in the context of generalized linear models. Our proposed renewable estimation enables us to sequentially update the maximum likelihood estimation and inference with current data and summary statistics of historic data, but with no use of any historic raw data themselves. In the implementation, we design a new data flow, called the Rho architecture to accommodate the data storage of current and historic data, as well as to communicate with the computing layer of the Spark system in order to facilitate sequential learning. We establish both estimation consistency and asymptotic normality for the renewable estimation and incremental inference for regression parameters. We illustrate our methods by numerical examples from both simulation experiments and real-world analysis. This is a joint work with Lan Luo.