Abstract

Active Learning for Sampling in Time-Series Experiments: With Applications to Gene Expression Analysis

Rohit Singh, Nathan Palmer, David Gifford, Bonnie Berger, and Ziv Bar-Joseph

Many time-series experiments seek to estimate some signal as a continuous function of time. In this paper, we address the sampling problem for such experiments: determining which time-points ought to be sampled in order to minimize the cost of data collection. We restrict our attention to a growing class of experiments which measure multiple signals at each time-point and where raw materials/observations are archived initially, and selectively analyzed later, this analysis being the more expensive step. We present an active learning algorithm for iteratively choosing time-points to sample, using the uncertainty in the quality of the currently estimated time-dependent curve as the objective function. Our method can handle multiple signals per time-point. By relying on Local Cross Validation (LCV) our algorithm handles both uniform and non uniform response rates. Using simulated data as well as gene expression data, we show that our algorithm performs well, and can signicantly reduce experimental cost without loss of information.

URL:http://theory.csail.mit.edu/tsample