Peter Krafft's Research Notebook

summary

This paper looks at a simple collaborative bandit game, where information is shared publically, and treats exploration as a public good. The authors test the extent to which humans play Nash in this context. In particular, the experiment consists of a two-player two-period one-armed bandit problem in which each player also has a safe return option, with the return possibly varying across players (are the safe returns commonly known?). The authors identify systematic deviations from optimal gameplay, including intermediate levels of free-riding and sticky choice behavior.

summary

This paper presents and analyzes three algorithms for solving distributed n-player cooperative multiarm bandit problems. The first algorithm identifies the best arm and the second identifies an epsilon-best arm in a one-step problem with an individual exploration followed by an aggregation phase. Both of these algorithms are optimal in terms of the number of pulls per player required (speedup of sqrt(n) over naive algorithm, lower bound is proved and obtained) for their respective problems and are quite simple. The algorithms bootstrap on existing individual multiarm bandit algorithms, adding a voting phase. The third algorithm, which includes multiple rounds of communication, is a bit more involved. This algorithm obtains optimal speedup of n from the naive algorithm, but requires 1/epsilon rounds of communication, which is not known to be optimal. The algorithm consists of the group iteratively removing arms from the set of arms being conisered over a series of rounds.

Optimization

Bandits

Distributed Optimization