Rahul Jain — Reinforcement Learning without Reinforcement

Seminar | April 16 | 3:30-4:30 p.m. | 3108 Etcheverry Hall

 Rahul Jain, University of Southern California

 Industrial Engineering & Operations Research

Abstract: Reinforcement Learning (RL) is concerned with solving sequential decision-making problems in the presence of uncertainty. RL is really about two problems together. The first is the `Bellman problem’: Finding the optimal policy given the model, which may involve large state spaces. Various approximate dynamic programming and RL schemes have been developed, but either there are no guarantees, or they are not universal, or rather slow. In fact, most RL algorithms have become synonymous with stochastic approximation schemes that are known to be rather slow. This is an even more difficult problem for MDPs with continuous state (and action) spaces. We present a class of RL algorithms for the continuous state space problem based on `empirical’ ideas, which are simple, effective and yet universal with probabilistic guarantees. The idea involves randomized Kernel-based function fitting combined with `empirical’ updates. The key is a “probabilistic contraction analysis” method we have developed for analysis of stochastic iterative algorithms, wherein we show convergence to a probabilistic fixed point of a sequence of random operators via a stochastic dominance argument.

The second RL problem is the `online learning (or the Lai-Robbins) problem’ when the model itself is unknown. We propose a simple posterior sampling-based regret-minimization reinforcement learning algorithm for MDPs. It achieves O(sqrt{T})-regret which is order-optimal. It not only optimally manages the “exploration versus exploitation tradeoff” but also obviates the need for expensive computation for exploration. The algorithm differs from classical adaptive control in its focus on non-asymptotic regret optimality as opposed to asymptotic stability. This seems to resolve a long standing open problem in Reinforcement Learning.

Biography: Rahul Jain is the K. C. Dahlberg Early Career Chair and Associate Professor of Electrical Engineering, Computer Science* and ISE* (*by courtesy) at the University of Southern California (USC). He received a B.Tech from the IIT Kanpur, and an MA in Statistics and a PhD in EECS from the University of California, Berkeley. Prior to joining USC, he was at the IBM T J Watson Research Center, Yorktown Heights, NY. He has received numerous awards including the NSF CAREER award, the ONR Young Investigator award, an IBM Faculty award, the James H. Zumberge Faculty Research and Innovation Award, and is currently a US Fulbright Specialist Scholar. His interests span reinforcement learning, stochastic control, statistical learning, stochastic networks, and game theory, and power systems and healthcare on the applications side. The talk is based on work with a number of outstanding students and postdocs who are now faculty members themselves at top places.