Publications• Sorted by Date • Classified by Publication Type • Classified by Research Category • Best-response play in partially observable card gamesFrans Oliehoek, Matthijs T. J. Spaan, and Nikos Vlassis. Best-response play in partially observable card games. In Benelearn 2005: Proceedings of the 14th Annual Machine Learning Conference of Belgium and the Netherlands, pp. 45–50, February 2005. DownloadAbstractWe address the problem of how to play optimally against a fixedopponent in a two-player card game with partial information likepoker. A game theoretic approach to this problem would specify a pairof stochastic policies that are best-responses to each other, i.e., aNash equilibrium. Although such a Nash-optimal policy guarantees alower bound to the attainable payoff against any opponent, it may not necessarily be optimal against a fixed opponent. We show here that if the opponent's policy is fixed (either known or estimated by repeatedplay), then we can model the problem as a partially observable Markovdecision process (POMDP) from the perspective of one agent, and solveit by dynamic programming. In particular, for a large class of cardgames including poker, the derived POMDP consists of a finite numberof belief states and it can be solved exactly. The resulting policy isguaranteed to be optimal even against a Nash-optimal policy. Weprovide experimental results to support our claims, using a simplified8-card poker game in which Nash-policies can be computed efficiently. BibTeX Entry@InProceedings{Oliehoek05Benelearn, author = {Frans Oliehoek and Matthijs T. J. Spaan and Nikos Vlassis}, title = {Best-response play in partially observable card games}, booktitle = {Benelearn 2005: Proceedings of the 14th Annual Machine Learning Conference of Belgium and the Netherlands}, month = feb, year = 2005, pages = {45--50}, abstract = { We address the problem of how to play optimally against a fixed opponent in a two-player card game with partial information like poker. A game theoretic approach to this problem would specify a pair of stochastic policies that are best-responses to each other, i.e., a Nash equilibrium. Although such a Nash-optimal policy guarantees a lower bound to the attainable payoff against any opponent, it may not necessarily be optimal against a fixed opponent. We show here that if the opponent's policy is fixed (either known or estimated by repeated play), then we can model the problem as a partially observable Markov decision process (POMDP) from the perspective of one agent, and solve it by dynamic programming. In particular, for a large class of card games including poker, the derived POMDP consists of a finite number of belief states and it can be solved exactly. The resulting policy is guaranteed to be optimal even against a Nash-optimal policy. We provide experimental results to support our claims, using a simplified 8-card poker game in which Nash-policies can be computed efficiently.} }
Generated by
bib2html.pl
(written by Patrick Riley) on
Wed Nov 06, 2013 16:37:08 UTC
|