Subtle but not malicious? The (high) computational cost of non-smoothness in learning from big data

Seminar | May 3 | 4-5 p.m. | 1011 Evans Hall

 Mikhail Belkin, Department of Computer Science and Engineering, Ohio State University

 Department of Statistics

What can we learn from big data? First, more data allows us to more precisely estimate probabilities of uncertain outcomes. Second, data provides better coverage to approximate functions more precisely. I will argue that the second is key to understanding the recent success of large scale machine learning. A useful way of thinking about this issue is that it is necessary to use many more components for principal component regression/classification, perhaps almost as many as data points. It turns out that there are fundamental computational barriers preventing some of the standard techniques (e.g., the kernel methods) from utilizing sufficiently many principal components on large datasets. These computational limitations result in over-regularization and failure to benefit from big data.

I will discuss the nature of these barriers and how they can be overcome. In particular, I will show a simple kernel algorithm (EigenPro) demonstrating significant and consistent improvements over the state of the art on large datasets.

Based on joint work with Siyuan Ma.

 pengdingpku@berkeley.edu