Algorithmic Regularization in Over-parameterized Matrix Recovery and Neural Networks with Quadratic Activations

Seminar | February 28 | 4-5 p.m. | 1011 Evans Hall

 Tengyu Ma, Facebook AI Research

 Department of Statistics

Over-parameterized models are widely and successfully used in deep learning, but their workings are far from understood. In many practical scenarios, the learned model generalizes to the test data, even though the hypothesis class contains a model that completely overfits the training data and no regularization is applied.

In this talk, we will show that such phenomenon occurs in over-parameterized matrix recovery models as well, and prove that the gradient descent algorithm provides additional regularization power that prevents the overfitting. The result can be extended to learning one-hidden-layer neural networks with quadratic activations. The key insight here is that gradient descent prefers searching through the set of low complexity (that is, low-rank) models first, and converges to a low complexity model with a good training error if such a model exists.

Based on joint work with Yuanzhi Li and Hongyang Zhang.