|
Transfer Learning | |
| Transfer learning involves two interrelated learning problems with the goal of using knowledge about one set of tasks to improve performance on a related task. Standard machine learning algorithms rely on inductive bias provided by a person, whereas "transfer-aware" algorithms incorporate inductive bias learned from one or more auxiliary tasks. The graphical model at left illustrates how transfer can be accomplished in a Bayesian framework by linking two data sources with a common hyper-distribution. |
|
For this project, we developed a "transfer-aware" version of the naive
Bayes classification algorithm, implemented using an extension of the
slice sampling method by Radford Neil (2003). We tested the algorithm
on a meeting acceptance task, where the goal was to predict whether a
person would accept or reject a meeting invitation given previously
gathered information about the person's schedule and relationships.
Twenty-one individuals participated and supplied a total of 3966
labeled examples. Each example was represented using 15 features that
captured relational information about the inviter, proposed meeting,
etc.
The learning curves below illustrate that, not surprisingly, the benefits of transfer learning depend on the similarity of the two data sources. Note that filled circles denote statistically significant differences (p<0.05) with the corresponding "B-only" baseline value. The inset graph shows that overall, many pairs of individuals result in statistically significant positive transfer (blue bars) compared to the transfer-unaware, B-only algorithm. However, about as many pairs yield negative transfer (red bars) at the smallest "B" training sizes. Not explicitly shown is the nearly identical performance of both algorithms at larger training sizes. In summary, hierarchical Bayesian methods are well-suited for transfer learning since they avoid negative transfer by detecting differences between data sources. For some tasks, however, the remaining challenge is to detect negative transfer with very little data from the target source.
| |
|
Related Publications | |
| To transfer or not to transfer | |
| M.T. Rosenstein, Z. Marx, L.P. Kaelbling, and T.G. Dietterich. To appear, NIPS 2005 Workshop on Inductive Transfer: 10 Years Later, 2005. [pdf] | |
| Transfer learning with an ensemble of background tasks | |
| Z. Marx, M.T. Rosenstein, L.P. Kaelbling, and T.G. Dietterich. To appear, NIPS 2005 Workshop on Inductive Transfer: 10 Years Later, 2005. [pdf] | |
|
Movies | |
| File | Description |
|
hyperposterior.gif 245 KB |
Animated GIF that illustrates how the arrival of training data changes
the posterior distribution of the hyperparameters. With no data, the
hyperprior is uniformly distributed along the horizontal
axis but somewhat concentrated about a mean of 50 along the vertical
axis. (See the figure below.) This is equivalent to setting a
hyperprior that encourages transfer but is otherwise noninformative
about the mean value of the lower-level parameter. (In this simple
example, the lower-level parameter is the probability of observing
heads when tossing a biased coin.) Roughly speaking, the value along
the vertical axis acts as a weight that determines how to combine the
lower-level data with the higher-level mean value. As auxiliary data
arrive, the distribution becomes concentrated about the mean value for
the auxiliary data. Then, when target data arrive, the distribution
changes dramatically for the case where the two data sources are very
different. Note that each frame of the animation shows one target data
source (with a mean value depicted by the "B" vertical line) and two
different auxiliary data sources (with means depicted by the "A"
vertical lines).
|
|
updated 12-Dec-2005 |