BLISS Seminar: Stabilizing Gradients for Deep Neural Networks
Seminar | April 23 | 3-4 p.m. | 540 Cory Hall
Inderjit Dhillon, A9/Amazon/UT Austin
Vanishing and exploding gradients are two main obstacles in training deep neural networks, especially when trying to capture long range dependencies in recurrent neural networks (RNNs). In this talk, I will present an efficient parametrization of the transition matrix of an RNN that stabilizes the gradients that arise in its training. Specifically, we parameterize the transition matrix by its singular value decomposition (SVD), which allows us to explicitly track and control its singular values. We attain efficiency by using tools that are common in numerical linear algebra, namely Householder reflectors for representing the orthogonal matrices that arise in the SVD. We present results on the Inline Search Suggestions (ISS) application at Amazon Search.