Random Features as an Alternative to the Kernel Trick

Ali Rahimi ()
with Benjamin Recht
(last modified 19 Sep, 2007)


About Random Features

Random Features are a trick for learning non-parameteric classifiers and regressors given huge (millions of data points) data sets.

What do they do?

There are many ways to construct random features. In a recent paper, we suggest deriving random features that approximate the positive definite kernels that used in Kernel Machines. This lets us efficiently leverage the performance of state of the art Kernel Machines on very large data sets.

Kernel machines (such as the Kernelized Support Vector Machine) are state-of-the-art architectures for classifying data, but training them is slow. They can rarely be trained with more than ~50,000 training examples in practice. To scale training to millions of examples, we utilize a linear version of the machine, wich can easily scale to millions of examples. To do this, we pass the data through random features, and train a linear machine on the featurized data. The random features are designed to ensure that the classifier generated in this way is approximately the same as the classifierd returned by the original (slow-to-train) kernel machine.

While the kernel trick lets us do linear things in infinite-dimensional space without an infinite amount of work, it still incurs a high computational cost. Random Features provide a finite-dimensional alternative to the Kernel Trick by instead mapping the data to an equivalent randomized feature space. For a given shift-invariant kernel we can find a -dimensional random function so that

The following table compares our results agains the Core Vector Machine

Comparison of testing error and training time between ridge regression with random features, Core Vector Machine, and various state-of-the-art exact methods reported in the literature. For classification tasks, the percent of testing points incorrectly predicted is reported. For regression tasks, the RMS error normalized by the norm of the ground truth is reported.

Links


Random Features for Large-Scale Kernel Machines, Ali Rahimi, Ben Recht,
to appear in Neural Information Processing Systems (NIPS) 2007. (pdf)

Here is MATLAB code for Random Features. And some datasets and sample runs that appear in the above paper (many thanks to the authors of the Core Vector Machine for supplying formatted versions of many of these). There may be bugs in the code. If you have trouble getting state-of-the-art classification/regression results with this code, please drop me a note and I'll help debug.