Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Sufficient Plan-Time Statistics for Decentralized POMDPs

Frans A. Oliehoek. Sufficient Plan-Time Statistics for Decentralized POMDPs. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 302–308, 2013.

Download

pdf [266.0kB]

Abstract

Optimal decentralized decision making in a team of cooperative agents as formalized in the framework of Decentralized POMDPs is a notoriously hard problem. A major obstacle is that the agents do not have access to a sufficient statistics during execution, which means that agents need to base their actions on their histories of observations. A consequence is that even during off-line planning the choice of decision rules for different stages is tightly interwoven: decisions of earlier stages affect how to act optimally at later stages, and the optimal value function for a stage is known to have a dependence on the decisions made up to that point. This paper makes a contribution to the theory of decentralized POMDPs by showing how this dependence on the `past joint policyâ€™ can be replaced by a probability distribution over histories and potentially states. That is, it introduces sufficient statistics for the past joint policy during the optimal planning process. These results are extended to the case of k-steps delayed communication. We investigate the practical implications in a number of benchmark problems and discuss future avenues of research opened by these contributions.

BibTeX Entry

@inproceedings{Oliehoek13IJCAI,
    author =    {Frans A. Oliehoek},
    title =     {Sufficient Plan-Time Statistics for Decentralized {POMDPs}},
    booktitle = ijcai13,
    year =      2013,
    pages =     {302--308},
    note =      {},
    abstract = {
    Optimal decentralized decision making in a team of cooperative agents
    as formalized in the framework of Decentralized POMDPs is a
    notoriously hard problem. A major obstacle is that the agents do
    not have access to a sufficient statistics during execution, which
    means that agents need to base their actions on their histories of
    observations. A consequence is that even during off-line planning
    the choice of decision rules for different stages is tightly
    interwoven: decisions of earlier stages affect how to act
    optimally at later stages, and the optimal value function for a
    stage is known to have a dependence on the decisions made up to
    that point. This paper makes a contribution to the theory of
    decentralized POMDPs by showing how this dependence on the `past
    joint policyâ€™ can be replaced by a probability distribution over
    histories and potentially states.  That is, it introduces
    sufficient statistics for the past joint policy during the optimal
    planning process. These results are extended to the case of
    k-steps delayed communication. We investigate the practical
    implications in a number of benchmark problems and discuss future
    avenues of research opened by these contributions.
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Wed Nov 06, 2013 16:37:07 UTC