BLISS Seminar: Finite Sample Convergence Bounds of Off-Policy Reinforcement Learning Algorithms

Seminar | February 11 | 1-2 p.m. | 400 Cory Hall

 Siva Theja Maguluri, Georgia Tech

 Electrical Engineering and Computer Sciences (EECS)

The focus of our work is to obtain finite-sample andor finite-time convergence bounds of various model-free Reinforcement Learning (RL) algorithms. Many RL algorithms are special cases of Stochastic Approximation (SA), which is a popular approach for solving fixed point equations when the information is corrupted by noise. We first obtain finite-sample bounds for general SA using a generalized Moreau envelope as a smooth potential Lyapunov function. We then use this result to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-Learning. We also use this result to improve the existing bound for the tabular Q-Learning algorithm from polynomial in state-space dimension to log of the dimension. We also use Lyapunov drift arguments to provide finite time error bounds of Q-learning algorithm with linear function approximation under an assumption on the sampling policy. This talk is based on the following papers: https:arxiv.orgabs2002.00874 and https:arxiv.orgabs1905.11425

 vipul_gupta@berkeley.edu