SUMMARY:Algorithmic Regularization in Over-parameterized Matrix Recovery and Neural Networks with Quadratic Activations
LOCATION:1011 Evans Hall
DESCRIPTION:Tengyu Ma\, Facebook AI Research\n\nOver-parameterized models are widely and successfully used in deep learning\, but their workings are far from understood. In many practical scenarios\, the learned model generalizes to the test data\, even though the hypothesis class contains a model that completely overfits the training data and no regularization is applied. \n\nIn this talk\, we will show that such phenomenon occurs in over-parameterized matrix recovery models as well\, and prove that the gradient descent algorithm provides additional regularization power that prevents the overfitting. The result can be extended to learning one-hidden-layer neural networks with quadratic activations. The key insight here is that gradient descent prefers searching through the set of low complexity (that is\, low-rank) models first\, and converges to a low complexity model with a good training error if such a model exists. \n\nBased on joint work with Yuanzhi Li and Hongyang Zhang. https://arxiv.org/abs/1712.09203
