Seminar | October 9 | 3:30-5 p.m. | 3108 Etcheverry Hall
Peter Bartlett, UC Berkeley Departments of Statistics and EECS
Deep neural networks have improved state-of-the-art performance for prediction problems across an impressive range of application areas, and they have become a central ingredient in AI systems. This talk considers factors that affect their performance, describing some recent results in two directions. First, we investigate the impact of depth on representation and optimization properties of these networks. We focus on deep residual networks, which have been widely adopted for computer vision applications because they exhibit fast training, even for very deep networks. We show that as the depth of these networks increases, they are able to represent a smooth invertible map with a simpler representation at each layer, and that this implies a desirable property of the functional optimization landscape that arises from regression with deep function compositions: stationary points are global optima. Second, we consider the generalization behavior of deep networks, that is, how their performance on training data compares to predictive accuracy. In particular, we aim to understand how to measure the complexity of functions computed by these networks. For multiclass classification problems, we present a margin-based generalization bound that scales with a certain margin-normalized "spectral complexity," involving the product of the spectral norms of the weight matrices in the network. We show how the bound gives insight into the observed performance of these networks in practical problems.
Joint work with Steve Evans and Phil Long, and with Matus Telgarsky and Dylan Foster.