Safety, robustness, and interpretability are some of the major challenges in developing systems for human-robot interactions. Learning-from-demonstrations (LfD) is a popular paradigm to obtain effective robot control policies for complex tasks via reinforcement learning without the need to explicitly design reward functions. However, this paradigm typically requires large datasets of user demonstrations. It is also susceptible to imperfections in demonstrations and raises concerns of safety and interpretability in the learned control policies. To address these issues, I will describe how Signal Temporal Logic (STL) can be used to express mission-level specifications for the robotic system which will then be used to evaluate and rank the quality of demonstrations. I will then show how these evaluations and rankings can be utilized to infer reward functions from only a handful and possibly imperfect demonstrations, which are later processed by reinforcement learning algorithms to obtain control policies that conform to the STL specifications. I will then present our recent work on extracting STL-based graphs that provide intuitive explanations about demonstrators' behaviors, that are then used to improve the reward and policy via apprenticeship learning.
Aniruddh Puranic is a PhD candidate in Department of Computer Science at the University of Southern California (USC). He is advised by Jyotirmoy (Jyo) Deshmukh of the CPS-VIDA lab and Stefanos Nikolaidis of the ICAROS lab. His research interests are broadly in the integration of formal methods and robot learning for safe and efficient human-robot interactions. He also holds a master’s degree in Computer Science (specializing in Intelligent Robotics) from USC.