Publications• Sorted by Date • Classified by Publication Type • Classified by Research Category • Sufficient Plan-Time Statistics for Decentralized POMDPsFrans A. Oliehoek. Sufficient Plan-Time Statistics for Decentralized POMDPs. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 302–308, 2013. DownloadAbstractOptimal decentralized decision making in a team of cooperative agents as formalized in the framework of Decentralized POMDPs is a notoriously hard problem. A major obstacle is that the agents do not have access to a sufficient statistics during execution, which means that agents need to base their actions on their histories of observations. A consequence is that even during off-line planning the choice of decision rules for different stages is tightly interwoven: decisions of earlier stages affect how to act optimally at later stages, and the optimal value function for a stage is known to have a dependence on the decisions made up to that point. This paper makes a contribution to the theory of decentralized POMDPs by showing how this dependence on the `past joint policyâ can be replaced by a probability distribution over histories and potentially states. That is, it introduces sufficient statistics for the past joint policy during the optimal planning process. These results are extended to the case of k-steps delayed communication. We investigate the practical implications in a number of benchmark problems and discuss future avenues of research opened by these contributions. BibTeX Entry@inproceedings{Oliehoek13IJCAI,
author = {Frans A. Oliehoek},
title = {Sufficient Plan-Time Statistics for Decentralized {POMDPs}},
booktitle = ijcai13,
year = 2013,
pages = {302--308},
note = {},
abstract = {
Optimal decentralized decision making in a team of cooperative agents
as formalized in the framework of Decentralized POMDPs is a
notoriously hard problem. A major obstacle is that the agents do
not have access to a sufficient statistics during execution, which
means that agents need to base their actions on their histories of
observations. A consequence is that even during off-line planning
the choice of decision rules for different stages is tightly
interwoven: decisions of earlier stages affect how to act
optimally at later stages, and the optimal value function for a
stage is known to have a dependence on the decisions made up to
that point. This paper makes a contribution to the theory of
decentralized POMDPs by showing how this dependence on the `past
joint policyâ can be replaced by a probability distribution over
histories and potentially states. That is, it introduces
sufficient statistics for the past joint policy during the optimal
planning process. These results are extended to the case of
k-steps delayed communication. We investigate the practical
implications in a number of benchmark problems and discuss future
avenues of research opened by these contributions.
}
}
Generated by
bib2html.pl
(written by Patrick Riley) on
Wed Nov 06, 2013 16:37:07 UTC
|