Seminar 217, Risk Management: Robust Learning: Information Theory and Algorithms
Seminar | October 9 | 11 a.m.-12:30 p.m. | 1011 Evans Hall
Speaker: Jacob Steinhardt, Stanford
This talk will provide an overview of recent results in high-dimensional robust estimation. The key question is the following: given a dataset, some fraction of which consists of arbitrary outliers, what can be learned about the non-outlying points? This is a classical question going back at least to Tukey (1960). However, this question has recently received renewed interest for a combination of reasons. First, many of the older results do not give meaningful error bounds in high dimensions (for instance, the error often includes an implicit sqrt(d)-factor in d dimensions). Second, recent connections have been established between robust estimation and other problems such as clustering and learning of stochastic block models. Currently, the best known results for clustering mixtures of Gaussians are via these robust estimation techniques. Finally, high-dimensional biological datasets with structured outliers such as batch effects, together with security concerns for machine learning systems, motivate the study of robustness to worst-case outliers from an applied direction.
The talk will cover both information-theoretic and algorithmic techniques in robust estimation, aiming to give an accessible introduction. We will start by reviewing the 1-dimensional case, and show that many natural estimators break down in higher dimensions. Then we will give a simple argument that robust estimation is information-theoretically possible. Finally, we will show that under stronger assumptions we can perform robust estimation efficiently, via a "dual coupling" inequality that is reminiscent of matrix concentration inequalities.