Imagine showing a friend how to play your favorite quest-based video game. A mission within such a game might be composed of multiple sub-quests that must be completed in order to complete that level. In this scenario, it is likely that your friend would comprehend what needs to be done in order to complete the mission well before he or she was actually able to play the game effectively. While learning from demonstrations, human apprentices can identify whether a task is executed correctly well before gaining expertise in that task. Most current approaches to learning from demonstration frame this problem as one of learning a reward function or policy within a Markov decision process setting; however, user specification of acceptable behaviors through reward functions and policies remains an open problem. Temporal logics are frequently used as a language for expressing desirable system behaviors, and can improve the interpretability of specifications if expressed as compositions of simpler templates.
The flexibility of LTL for specifying behaviors also represents a key challenge with regard to inference due to a large hypothesis space. We address this by restricting the hypothesis space to logical compositions of certain relevant behavior templates, thus encoding useful inductive biases. Another key challenge is the inherent ambiguity of the task where multiple formulas explain all the observed demonstrations equally well. To address that, we propose a Bayesian formulation of specification inference where the output is not a single formula, but a posterior distribution over formulas conditioned on the observed demonstrations, thus automatically representing the ambiguity and the uncertainty associated with it.