Dissertation Talk: Sample Complexity Bounds for the Linear Quadratic Regulator

Seminar: Dissertation Talk: EE | May 1 | 3-4 p.m. | 380 Soda Hall

 Stephen Tu, UC Berkeley

 Electrical Engineering and Computer Sciences (EECS)

Reinforcement learning (RL) has demonstrated impressive performance in various domains such as video games, Go, robotic locomotion, and manipulation tasks. As we turn towards RL to power autonomous systems in the physical world, a natural question to ask is, how do we ensure that the behavior observed in the laboratory reflects the behavior that occurs when systems are deployed in the real world? How much data do we need to collect in order to learn how to control a system with a high degree of confidence? This thesis takes a step towards answering these questions by establishing the Linear Quadratic Regulator (LQR) as a baseline for comparison of RL algorithms.

The first part of this thesis focuses on model-based algorithms which estimate a model of the underlying system, and then build a controller based on the estimated dynamics. We show that the classic certainty equivalence controller, which discards confidence intervals surrounding the estimated dynamics, is efficient in regimes of low uncertainty. For regimes of moderate uncertainty, we propose a new model-based algorithm based on robust optimization, and show that it also sample efficient.

The second part studies model-free algorithms which learn intermediate representations instead, or directly search for the parameters of the optimal controller. We use tools from asymptotic statistics to characterize the behavior of both the certainty equivalence controller and the popular policy gradient method on a particular family of LQR instances. This comparison reveals that the model-free policy gradient method has polynomial in state/input dimension and horizon length worse sample complexity than the model-based certainty equivalence controller. Our experiments corroborate this finding and show that model-based algorithms are more sample efficient than model-free algorithms for LQR.