Support points – a new way to reduce big and high-dimensional data

Seminar | January 28 | 4-5 p.m. | 1011 Evans Hall

 Simon Mak, Georgia Institute of Technology

 Department of Statistics

This talk presents a new method for reducing big and high-dimensional data into a smaller dataset, called support points (SPs). In an era where data is plentiful but downstream analysis is oftentimes expensive, SPs can be used to tackle many big data challenges in statistics, engineering and machine learning. SPs have two key advantages over existing methods. First, SPs provide optimal and model-free reduction of big data for a broad range of downstream analyses. Second, SPs can be efficiently computed via parallelized difference-of-convex optimization; this allows us to reduce millions of data points to a representative dataset in mere seconds. SPs also enjoy appealing theoretical guarantees, including distributional convergence and improved reduction over random sampling and clustering-based methods. The effectiveness of SPs is then demonstrated in two real-world applications, the first for reducing long Markov Chain Monte Carlo (MCMC) chains for rocket engine design, and the second for data reduction in computationally intensive predictive modeling.