Dissertation Talk: Measuring Generalization and Overfitting in Machine Learning

Seminar: Dissertation Talk: CS | May 6 | 1-2 p.m. | 405 Soda Hall

 Rebecca Roelofs

 Electrical Engineering and Computer Sciences (EECS)

Due to the prevalence of machine learning (ML) algorithms and the potential for their decisions to profoundly impact billions of human lives, it is crucial that they are robust, reliable, and understandable. This thesis examines key theoretical pillars of ML surrounding generalization and overfitting, and tests the extent to which empirical behavior matches existing theory. We develop novel methods for measuring overfitting and generalization, and we characterize how reproducible observed behavior is across differences in optimization algorithm, dataset, task, evaluation metric, and domain.

First, we examine how optimization algorithms bias ML models towards solutions with varying generalization properties. We show that adaptive gradient methods empirically find solutions with worse generalization behavior than stochastic gradient descent and construct an example using a simple overparameterized model that corroborates the algorithms’ empirical behavior on neural networks.

Next, we study the extent to which ML models have overfit to commonly reused datasets in both academic benchmarks and ML competitions. We build new test sets for the CIFAR-10 and ImageNet datasets and evaluate a broad range of classification models on the new datasets. All models experience a drop in accuracy, which indicates that current accuracy numbers are susceptible to even minute natural variations in the data distribution. Surprisingly, despite several years of adaptively selecting the models to perform well on these competitive benchmarks, we find no evidence of overfitting. We then analyze data from the ML platform Kaggle and show that overfitting is also absent in ML competitions.

Overall, our work suggests that the true concern for robust ML is distribution shift rather than overfitting, and designing models that still work reliably in dynamic environments is a challenging but necessary undertaking.

 CA, roelofs@berkeley.edu, 3024899818