Leveraging Temporal Structure in Task Specifications for POMDP Planning

Abstract

Planning sequential actions in a partially observable environment while satisfying temporal constraints is challenging yet an essential feature of many robotic applications.A constrained natural language command like “Find the new apartment complex while avoiding the park” is difficult for an autonomous delivery drone to understand. Previous planning methods chose to sacrifice generality for optimality and efficiency in large state-action spaces by using domain and task specific action heuristics or used a full-width backup planner that did not scale well. We represent a set of constrained task specifications as linear temporal logic (LTL) expressions and present a new sampling-based POMDP planner, LTL-POMCP, that leverages structured constraints for efficient planning by constructing a shaping term to bias action selection towards achieving subgoals of an LTL. We augment an environment partially observable Markov decision process (POMDP) with an LTL task specification then use LTL-POMCP to efficiently solve the resultant composite POMDP. Quantitative results show that LTL-POMCP can efficiently solve LTL tasks in various domains, and scale to large environments. We demonstrate the first end-to-end system from temporally-constrained natural language to robot policies in partially observable maps in simulation with up to 50% improvement in wall clock time, and on a mobile manipulator in the real world.

Publication
Conference on Reinforcement Learning and Decision Making