Nonparametric Bayesian Approaches for Reinforcement Learning in Partially Observable Environments

The partially-observable Markov decision process (POMDP) framework has been successful in many planning domains where the agent must choose between actions that maximize immediate reward and actions that provide information about the environment. Unfortunately, POMDPs are defined by a large number of parameters that are difficult to set a priori; gathering enough training data may also be prohibitively expensive. Thus, it is most realistic to consider that an agent acting in a partially observable environment will have to do some learning, that is, determine some of the world's properties in an online manner through its interactions with the environment.

Reinforcement learning in partially observable domains is a difficult problem, however, as the agent never has access to the true state of the environment. In these situations, Bayesian approaches are useful as they help the agent make decisions based on distributions of possible environmental models (instead of just considering a single model). The priors associated with Bayesian techniques typically guide the use of the data to learn models more quickly than other approaches. Nonparametric approaches can alleviate the issues of model size by allowing the model to grow as more data is observed. Growing complexity in a data-directed fashion is particularly attractive for online settings where computational costs are an important consideration; nonparametric approaches can help the agent ignore parameters for which it has no data to model.

While often data-efficient, the use of Bayesian nonparametric (and often just Bayesian) approaches to reinforcement learning have seen limited use due to their high computational complexity. The contributions of this work are two-fold: First, we will develop Bayesian nonparametric models that are particularly suited for reinforcement learning applications. Second, we will develop efficient, online algorithms for using these models that will allow these models to be applied to realworld scenarios. The expected contributions are outlined below: