Christopher Amato -- Research

Chris Amato

Research Scientist
CSAIL and LIDS, MIT
camato at csail dot mit dot edu

Home | Publications | Research | Talks

Research:

Optimal coordination of decentralized agents

Coordination under uncertainty is difficult, even in cooperative systems. When communication is unreliable, slow or costly, agents must rely on local information to make decisions. These cooperative systems can be modeled as decentralized partially observable Markov decision processes (Dec-POMDPs). Dec-POMDPs are very general and can represent agent teams with stochastic actions and partial information. One example is a vehicles optimizing exploration while minimizing fuel and communication usage such as Mars rovers or reconnaissance drones.

My research has made several contributions to generating optimal and bounded-optimal Dec-POMDP solutions, including:

The first (and only) epsilon-optimal solution method for infinite-horizon Dec-POMDPs [JAIR 09]
Using a centralized planning stage, but decentralized execution to transform (finite-horizon) Dec-POMDPs into continuous-state MDPs, allowing powerful centralized (POMDP) methods to be used [IJCAI 13]
An efficient dynamic programming algorithm that can automatically take advantage of state-space structure [ICAPS 09]
The most scalable optimal (top-down) heuristic search technique for finite-horizon Dec-POMDPs [JAIR 13]

These algorithms represent the most scalable bounded and optimal methods for Dec-POMDPs, showing that optimal methods can solve large interesting problems.

Optimizing agent performance with limited resources

When only a single agent is present or centralization is possible, POMDPs are a natural model for sequential decision-making with action and sensor uncertainty. In both POMDPs and Dec-POMDPs, solutions can become large and complicated, but searching through and representing these solutions can be computationally intractable.

My research has developed fixed-memory approaches that can often provide scalable, high quality solutions:

A fixed-memory formulation for (infinite-horizon) POMDPs which is optimal with respect to the fixed size and an optimization approach to approximate solutions [IJCAI 07]
A fixed-memory formulation and solution method for Dec-POMDPs [UAI 07]
Additional improvements including more scalable solution methods [JAAMAS 09] and a technique that automatically discovers and exploits problem structure [AAAI 10]

These approaches provide concise solutions that balance solution quality and available resources, resulting in the best known performance in a number of common problems.

Improving scalability in algorithms for uncertain decentralized systems

Although finding optimal solutions for Dec-POMDPs can be difficult, we can often find high quality approximate solutions and some problems permit more scalability due to their structure. My research has developed:

A sample-based algorithm for finite-horizon and goal-based Dec-POMDPs and with bounded optimality [AAMAS 09]
Scalable solution methods assuming additional structure is present in the form of independence in agent dynamics [UAI 12, AAMAS 13]

These methods solve significantly larger problems in terms of states and agents while retaining solution quality bounds.

Balancing time and accuracy in video surveillance

With the profusion of video data, automated surveillance and intrusion detection is becoming closer to reality. In order to provide a timely response while limiting false alarms, an intrusion detection system must balance resources (e.g., time) and accuracy.

We showed how such a system can be modeled with a partially observable Markov decision process (POMDP), representing possible computer vision filters and their costs in a way that is similar to human vision systems [IAAI 12].

Machine learning for video games

Video games provide a rich testbed for artificial intelligence methods. In particular, creating automated opponents that perform well in strategy games is a difficult task. For instance, human players rapidly discover and exploit the weaknesses of hard coded strategies.

To build better strategies, we developed a reinforcement learning approach for learning a policy that switches between high-level strategies [AAMAS 10].

Coordination in multi-robot systems

We are currently working with a team of Turtlebots on tasks such as multi-robot navigation among movable obstacles (NAMO), exploration and simulated warehouse tasks. These problems are modeled as Dec-POMDPs and we are using solution methods based on those discussed above. We also plan to test these approaches on a set of quadrotors in surveillance tasks.