Publications• Sorted by Date • Classified by Publication Type • Classified by Research Category • Best-response play in partially observable card gamesFrans Oliehoek, Matthijs T. J. Spaan, and Nikos Vlassis. Best-response play in partially observable card games. In Benelearn 2005: Proceedings of the 14th Annual Machine Learning Conference of Belgium and the Netherlands, pp. 45–50, February 2005. DownloadAbstractWe address the problem of how to play optimally against a fixedopponent in a two-player card game with partial information likepoker. A game theoretic approach to this problem would specify a pairof stochastic policies that are best-responses to each other, i.e., aNash equilibrium. Although such a Nash-optimal policy guarantees alower bound to the attainable payoff against any opponent, it may not necessarily be optimal against a fixed opponent. We show here that if the opponent's policy is fixed (either known or estimated by repeatedplay), then we can model the problem as a partially observable Markovdecision process (POMDP) from the perspective of one agent, and solveit by dynamic programming. In particular, for a large class of cardgames including poker, the derived POMDP consists of a finite numberof belief states and it can be solved exactly. The resulting policy isguaranteed to be optimal even against a Nash-optimal policy. Weprovide experimental results to support our claims, using a simplified8-card poker game in which Nash-policies can be computed efficiently. BibTeX Entry@InProceedings{Oliehoek05Benelearn,
author = {Frans Oliehoek and Matthijs T. J. Spaan and Nikos
Vlassis},
title = {Best-response play in partially observable card
games},
booktitle = {Benelearn 2005: Proceedings of the 14th Annual
Machine Learning Conference of Belgium and the
Netherlands},
month = feb,
year = 2005,
pages = {45--50},
abstract = {
We address the problem of how to play optimally against a fixed
opponent in a two-player card game with partial information like
poker. A game theoretic approach to this problem would specify a pair
of stochastic policies that are best-responses to each other, i.e., a
Nash equilibrium. Although such a Nash-optimal policy guarantees a
lower bound to the attainable payoff against any opponent, it may not
necessarily be optimal against a fixed opponent. We show here that if
the opponent's policy is fixed (either known or estimated by repeated
play), then we can model the problem as a partially observable Markov
decision process (POMDP) from the perspective of one agent, and solve
it by dynamic programming. In particular, for a large class of card
games including poker, the derived POMDP consists of a finite number
of belief states and it can be solved exactly. The resulting policy is
guaranteed to be optimal even against a Nash-optimal policy. We
provide experimental results to support our claims, using a simplified
8-card poker game in which Nash-policies can be computed efficiently.}
}
Generated by
bib2html.pl
(written by Patrick Riley) on
Wed Nov 06, 2013 16:37:08 UTC
|