Seminar | September 21 | 3-4 p.m. | 250 Sutardja Dai Hall
Tuomas Haarnoja, UC Berkeley
The intersection of expressive, general-purpose function approximators, such as neural networks, with general-purpose model-free reinforcement learning (RL) algorithms holds the promise of automating a wide range of robotic behaviors: reinforcement learning provides the formalism for reasoning about sequential decision making, while large neural networks can process high-dimensional and noisy observations to provide a general representation for any behavior with minimal manual engineering. However, applying model-free RL algorithms with multilayer neural networks (i.e., deep RL) to real-world robotic control problems has proven to be very difficult in practice: the sample complexity of model-free methods tends to be quite high, and training tends to yield high-variance results. In this talk, I will discuss how maximum entropy principle can enable deep RL for real-world robotic applications. First, by representing policies as expressive energy-based models, maximum entropy RL leads to effective, multi-modal exploration that can reduce sample complexity. Second, maximum entropy policies can promote reusability through compositionality, meaning that existing policies can be combined to create new compound policies without extra interaction with the environment. And third, policies expressed via invertible transformations lead to natural formation of policy hierarchies that can be used to solve sparse reward tasks. I will demonstrate these properties in both simulated and real-world robot tasks.