Planning with Uncertain Specifications

Abstract

Reward engineering is crucial to high performance in reinforcement learning systems. Prior research into reward design has largely focused on Markovian functions representing the reward. While there has been research into expressing non- Markovian rewards as linear temporal logic (LTL) formulas, this has been limited to a single formula serving as the task speci- fication. However, in many real-world applications, task specifi- cations can only be expressed as a belief over LTL formulas. In this paper, we introduce planning with uncertain specifications (PUnS), a novel formulation that addresses the challenge posed by non-Markovian specifications expressed as beliefs over LTL formulas. We present four criteria that capture the semantics of satisfying a belief over specifications for different applications, and analyze the implications of these criteria within a synthetic domain. We demonstrate the existence of an equivalent markov decision process (MDP) for any instance of PUnS.

Publication
Robotics: Science and Systems, Workshop on Combining Learning and Reasoning – Towards Human-Level Robot Intelligence