Optimization

*Wolpert and Macready (1997): "No Free Lunch Theorems for Optimization" pdf

Bandits

*Badanidiyuru et al. (2013): "Bandits with Knapsacks" link
Boyce et al. (2009): "Be My Guinea Pig: Information Spillovers in a One-Armed Bandit Game" link
summary
This paper looks at a simple collaborative bandit game, where information is shared publically, and treats exploration as a public good. The authors test the extent to which humans play Nash in this context. In particular, the experiment consists of a two-player two-period one-armed bandit problem in which each player also has a safe return option, with the return possibly varying across players (are the safe returns commonly known?). The authors identify systematic deviations from optimal gameplay, including intermediate levels of free-riding and sticky choice behavior.

Bull (2013): "Adaptive-treed bandits" pdf
Asaf Cohen and Eilon Solan (2013): "Bandit Problems with Levy Processes" link (In which a continuous-time bandit problem is considered)
Cherkassky and Bornn (2013): "Sequential Monte Carlo Bandits" pdf
CHRISTOS H. PAPADIMITRIOU AND JOHN N. TSITSIKLIS (1999): "THE COMPLEXITY OF OPTIMAL QUEUING NETWORK CONTROL" pdf
Garivier and Cappe (2011): "The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond" link
Hillel et al. (2012): "Distributed Exploration in Multi-Armed Bandits" pdf
summary
This paper presents and analyzes three algorithms for solving distributed n-player cooperative multiarm bandit problems. The first algorithm identifies the best arm and the second identifies an epsilon-best arm in a one-step problem with an individual exploration followed by an aggregation phase. Both of these algorithms are optimal in terms of the number of pulls per player required (speedup of sqrt(n) over naive algorithm, lower bound is proved and obtained) for their respective problems and are quite simple. The algorithms bootstrap on existing individual multiarm bandit algorithms, adding a voting phase. The third algorithm, which includes multiple rounds of communication, is a bit more involved. This algorithm obtains optimal speedup of n from the naive algorithm, but requires 1/epsilon rounds of communication, which is not known to be optimal. The algorithm consists of the group iteratively removing arms from the set of arms being conisered over a series of rounds.

Steven Scott (2010): "A modern Bayesian look at the multi-armed bandit" pdf (In which a probability matching rule is applied to the bandit problem)
*Tekin and van der Schaar (2013): "Distributed Online Learning via Cooperative Contextual Bandits" link

Distributed Optimization

Macready and Wolpert (2005): "Distributed Constrained Optimization" pdf
Niu et al. (2011): "Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent" pdf

Peter M Krafft Last modified: Tue Sep 16 17:28:17 EDT 2014