Dissertation Talk: Approaches to Safety in Inverse Reinforcement Learning

Presentation | December 10 | 12-1 p.m. | 337B Cory Hall

 Dexter Scobee

 Electrical Engineering and Computer Sciences (EECS)

Title: Approaches to Safety in Inverse Reinforcement Learning
Speaker: Dexter Scobee
Advisor: S. Shankar Sastry

Date: Tuesday, December 10th, 2019
Time: 12:00pm - 1:00pm
Location: Cory 337B

Abstract:
As the capabilities of robotic systems increase, we move closer to the vision of ubiquitous robotic assistance throughout our everyday lives. In transitioning robots and autonomous systems from traditional factory and industrial settings, it is critical that these systems are able to adapt to uncertain environments and the humans who populate them. In order to better understand and predict the behavior of these humans, Inverse Reinforcement Learning (IRL) uses demonstrations to infer the underlying motivations driving human actions. The information gained from IRL can be used to improve a robot’s understanding of the environment as well as to allow the robot to better interact with or assist humans.

In this talk, we address the challenge of incorporating safety into the application of IRL. We first consider safety in the context of using IRL for assisting humans in shared control tasks. Through a user study, we show how incorporating haptic feedback into human assistance can increase humans’ sense of control while improving safety in the presence of imperfect learning. Further, we present our method for using IRL to automatically create such haptic feedback policies from task demonstrations.

We further address safety in IRL by incorporating notions of safety directly into the learning process. Currently, most work on IRL focuses on learning explanatory rewards that humans are modeled as optimizing. However, pure reward optimization can fail to effectively capture hard requirements, such as safety constraints. We draw on the definition of safety from Hamilton-Jacobi reachability analysis to infer human perceptions of safety and to modify robot behavior to respect these learned safety constraints. We also extend this work on learning constraints by adapting the framework of Maximum Entropy IRL in order to learn hard constraints given nominal task rewards, and we show how this technique infers the most likely constraints to align expected behavior with observed demonstrations.

 dscobee@berkeley.edu